Education
Relevant Courses: Geometric Based Methods for Vision (16-822), Advanced Multimodal Machine Learning (11-777), Visual Learning and Recognition (16-824), Robot Localization and Mapping (16-833) Computer Vision (16-720), Introduction to Machine Learning (10-601), Mathematical Fundamentals for Robotics (16-811)
Relevant Courses: Artificial Intelligence, Neural Networks, Data Warehousing and Data Mining, Data Structures and Algorithms
Experiences
I oversee the development of object detection, semantic segmentation, and tracking models for federal customers using aerial/satellite imagery. We collaborate closely with clients to understand their needs, design tailored models, and employ the latest computer vision techniques. Our goal is to provide accurate and reliable insights from visual data to aid their decision-making processes.
TA for the Graduate Computer Vision course 16-720 taught by Prof. John Galeotti. Responsibilities include preparing assignments, grading, and holding office hours for students.
Perception Intern at Uber Advanced Technologies Group (https://www.uber.com/info/atg/), Self Driving Division of Uber.
Built a model to characterize LiDAR performance and identify key factors that contribute to detector performance and then use that model to drive decision making about what specifications are important in next-gen.
Successfully tested transfer learning approaches so that models can be re-used across LiDARs with different beam spacings and different scanning patterns.
TA for the Graduate Computer Vision course 16-720 taught by Prof. Srinivasa Narasimhan. Responsibilities include preparing assignments, grading, and holding office hours for students.
Developed a multi-camera/multi-robot facility for Cozmo, an autonomous robot, by repurposing old phones to act as perched cameras.
Performed Camera calibration and worked on SLAM. Created an Independent server which shares a Shared World Map with its Clients (Robots) helping them to better path plan and navigate.
Worked on contextual in-video advertising project which helps a potential advertiser to advertise his/her brand in a contextually relevant video and at the least intrusive position in that video.
Created a context aware ad recommendation/ insertion system using multi-modal analytics through semantic understanding of video content.
Handled the notification module in Shopclues’s Product of Sale (POS) app by integrating Firebase Cloud Messaging and Firebase Notifications.
Integrated Firebase Analytics in the application to log important events.
Publications
- An Extensible Multi-Sensor Fusion Framework for 3D Imaging
Talha Ahmed Siddiqui, Rishi Madhok, Matthew Toole
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020), WAD Workshop - Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Unaiza Ahsan, Rishi Madhok, Irfan Essa
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), LUV Workshop - Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Unaiza Ahsan, Rishi Madhok, Irfan Essa
IEEE Winter Conference on Applications of Computer Vision (WACV), 2019 - Semantic Understanding for Contextual In-Video Advertising
Rishi Madhok, Shashank Mujumdar, Nitin Gupta, Sameep Mehta
AAAI Conference on Artificial Intelligence, 2018 - Proposing Contextually Relevant Quotes for Images
Shivali Goel, Rishi Madhok, Shweta Garg
European Conference on Information Retrieval (ECIR), 2018 - Content Driven Enrichment of Formal Text Using Concept Definitions and Applications
Abhinav Jain, Shashank Mujumdar, Nitin Gupta, Sameep Mehta, Rishi Madhok
ACM Conference on Hypertext and Social Media (HT), 2018 - SentiMozart: Music Generation Based on Emotions
Rishi Madhok, Shivali Goel, Shweta Garg
International Conference on Agents and Artificial Intelligence (ICAART), 2018
Research and Projects
- Explored and developed a suite of sensor fusion techniques around an emerging sensing technology known as a single-photon avalanche diode, or SPAD.
- Developed novel sensor fusion algorithm to fuse input data from Single Photon LiDARs, Stereo Camera Pair and RADARs.
- Proposed a solution which forms an intermediate cost volume representation from different sensors that when passed through deep neural network (PSM-Net), estimates better disparity and depth information of the scene.
- Worked on recognizing multi-object activities such as left/right turning of a car and car taking a u-turn.
- Working on the generation of synthetic data which closely matches the real world data in a graphics environment such as Unreal Engine.
- Deployed a bi-directional RNN to recognize the activities.
- Preserved the context as well as the temporal sequence of a video by understanding its semantics.
- Created a sequence to sequence model, where a LSTM is used both as an encoder as well as a decoder and finally generated a summary of the video in natural language.