Rishi Madhok

Grad Student at Carnegie Mellon University. A Computer Vision and Machine Learning enthusiast.

Homepage Resources

Curriculum Vitae

On the Internet

Languages

Hindi (Native)
English (Professional)

Interests

Fitness Enthusiast
Golfer
Cooking
Drummer

Education

Carnegie Mellon University

Aug 2018 - December 2019

Masters of Science, Computer Vision | CGPA: 4.17 / 4.33

Relevant Courses: Geometric Based Methods for Vision (16-822), Advanced Multimodal Machine Learning (11-777), Visual Learning and Recognition (16-824), Robot Localization and Mapping (16-833) Computer Vision (16-720), Introduction to Machine Learning (10-601), Mathematical Fundamentals for Robotics (16-811)

Delhi Technological University

Aug 2014 - May 2018

Bachelors of Technology, Computer Science and Engineering | CGPA: 9.368 / 10

Relevant Courses: Artificial Intelligence, Neural Networks, Data Warehousing and Data Mining, Data Structures and Algorithms

Experiences

Microsoft

February 2020 - Present

Senior Applied Science Manager

I oversee the development of object detection, semantic segmentation, and tracking models for federal customers using aerial/satellite imagery. We collaborate closely with clients to understand their needs, design tailored models, and employ the latest computer vision techniques. Our goal is to provide accurate and reliable insights from visual data to aid their decision-making processes.

Carnegie Mellon University, Robotics Institute

August 2019 - December 2019

Graduate Teaching Assistant

TA for the Graduate Computer Vision course 16-720 taught by Prof. John Galeotti. Responsibilities include preparing assignments, grading, and holding office hours for students.

Uber Advanced Technologies Group

May 2018 - Present

Perception Intern | Advisor: Mr. Warren Smith

Perception Intern at Uber Advanced Technologies Group (https://www.uber.com/info/atg/), Self Driving Division of Uber.

Built a model to characterize LiDAR performance and identify key factors that contribute to detector performance and then use that model to drive decision making about what specifications are important in next-gen.

Successfully tested transfer learning approaches so that models can be re-used across LiDARs with different beam spacings and different scanning patterns.

Carnegie Mellon University, Robotics Institute

Jan 2018 - May 2018

Graduate Teaching Assistant

TA for the Graduate Computer Vision course 16-720 taught by Prof. Srinivasa Narasimhan. Responsibilities include preparing assignments, grading, and holding office hours for students.

Carnegie Mellon University, Computer Science Department

June - August 2018

Visiting Research Scholar | Advisor: Prof. Dave Touretzky

Developed a multi-camera/multi-robot facility for Cozmo, an autonomous robot, by repurposing old phones to act as perched cameras.

Performed Camera calibration and worked on SLAM. Created an Independent server which shares a Shared World Map with its Clients (Robots) helping them to better path plan and navigate.

IBM Research Labs, New Delhi, India

June - August 2017

Research Intern | Manager: Dr. Sameep Mehta

Worked on contextual in-video advertising project which helps a potential advertiser to advertise his/her brand in a contextually relevant video and at the least intrusive position in that video.

Created a context aware ad recommendation/ insertion system using multi-modal analytics through semantic understanding of video content.

Shopclues (an E-Commerce Marketplace), Gurugram, India

June - August 2016

Software Development Intern

Handled the notification module in Shopclues’s Product of Sale (POS) app by integrating Firebase Cloud Messaging and Firebase Notifications.

Integrated Firebase Analytics in the application to log important events.

Publications

An Extensible Multi-Sensor Fusion Framework for 3D Imaging
Talha Ahmed Siddiqui, Rishi Madhok, Matthew Toole
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2020), WAD Workshop

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Unaiza Ahsan, Rishi Madhok, Irfan Essa
IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2019), LUV Workshop

Video Jigsaw: Unsupervised Learning of Spatiotemporal Context for Video Action Recognition
Unaiza Ahsan, Rishi Madhok, Irfan Essa
IEEE Winter Conference on Applications of Computer Vision (WACV), 2019

Semantic Understanding for Contextual In-Video Advertising
Rishi Madhok, Shashank Mujumdar, Nitin Gupta, Sameep Mehta
AAAI Conference on Artificial Intelligence, 2018

Proposing Contextually Relevant Quotes for Images
Shivali Goel, Rishi Madhok, Shweta Garg
European Conference on Information Retrieval (ECIR), 2018

Content Driven Enrichment of Formal Text Using Concept Definitions and Applications
Abhinav Jain, Shashank Mujumdar, Nitin Gupta, Sameep Mehta, Rishi Madhok
ACM Conference on Hypertext and Social Media (HT), 2018

SentiMozart: Music Generation Based on Emotions
Rishi Madhok, Shivali Goel, Shweta Garg
International Conference on Agents and Artificial Intelligence (ICAART), 2018

Research and Projects

Sensor Fusion with Single-Photon Detectors | Advisor: Prof. Matthew O’Toole

Explored and developed a suite of sensor fusion techniques around an emerging sensing technology known as a single-photon avalanche diode, or SPAD.
Developed novel sensor fusion algorithm to fuse input data from Single Photon LiDARs, Stereo Camera Pair and RADARs.
Proposed a solution which forms an intermediate cost volume representation from different sensors that when passed through deep neural network (PSM-Net), estimates better disparity and depth information of the scene.

Action Recognition using Synthetic Data | Advisor: Prof. Kris Kitani

Worked on recognizing multi-object activities such as left/right turning of a car and car taking a u-turn.
Working on the generation of synthetic data which closely matches the real world data in a graphics environment such as Unreal Engine.
Deployed a bi-directional RNN to recognize the activities.

Semantic Understanding of a Video | Advisor: Dr. Rajni Jindal

Preserved the context as well as the temporal sequence of a video by understanding its semantics.
Created a sequence to sequence model, where a LSTM is used both as an encoder as well as a decoder and finally generated a summary of the video in natural language.

Rishi Madhok

Grad Student at Carnegie Mellon University. A Computer Vision and Machine Learning enthusiast.

Homepage Resources

On the Internet

Languages

Interests

Education

Carnegie Mellon University

Delhi Technological University

Experiences

Microsoft

Carnegie Mellon University, Robotics Institute

Uber Advanced Technologies Group

Carnegie Mellon University, Robotics Institute

Carnegie Mellon University, Computer Science Department

IBM Research Labs, New Delhi, India

Shopclues (an E-Commerce Marketplace), Gurugram, India

Publications

Research and Projects

Skills & Proficiency

Python

OpenCV

C++ & Java

Android Studio

HTML5 & CSS

Cozmo Tools