awesome-egocentric-vision icon indicating copy to clipboard operation
awesome-egocentric-vision copied to clipboard

A curated list of egocentric (first-person) vision and related area resources

Awesome Egocentric Vision Awesome

A curated list of egocentric vision resources.

Egocentric (first-person) vision is a sub-field of computer vision that analyses image/video data obtained using a wearable camera simulating a person's visual field.

Contents

  • Papers

    Clustered into various problem statements.

    • Action/Activity Recognition
    • Object/Hand Recognition
    • Action/Gaze Anticipation
    • Localization
    • Clustering
    • Video Summarization
    • Social Interactions
    • Pose Estimation
    • Human Object Interaction
    • Temporal Boundary Detection
    • Privacy in Egocentric Videos
    • Multiple Egocentric Tasks
    • Miscellaneous (New Tasks)

    Clustered according to the conferences.

    • CVPR
    • ECCV
    • ICCV
    • WACV
    • BMVC
  • Datasets

Papers

Clustered in various problem statements.

Action/Activity Recognition

Object/Hand Recognition

Action/Gaze Anticipation

Localization

Clustering

Video Summarization

Social Interactions

Pose Estimation

Human Object Interaction

Temporal Boundary Detection

Privacy in Egocentric Videos

Multiple Egocentric Tasks

  • Ego4D: Around the World in 3,000 Hours of Egocentric Video - Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Christian Fuegen, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei Huang, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Bernard Ghanem, Vamsi Krishna Ithapu, C.V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, and Jitendra Malik. In CVPR 2022. [Github] [project page] [video]

Miscellaneous (New Tasks)

Clustered according to the conferences.

CVPR

ECCV

ICCV

WACV

BMVC

Datasets

  • Ego4D - 3,025 hours of daily-life activity video spanning hundreds of scenarios (household, outdoor, workplace, leisure, etc.) captured by 855 unique camera wearers from 74 worldwide locations and 9 different countries.
  • HOI4D - HOI4D consists of 2.4M RGB-D egocentric video frames over 4000 sequences collected by 9 participants interacting with 800 different object instances from 16 categories over 610 different indoor rooms.
  • EgoCom - A natural conversations dataset containing multi-modal human communication data captured simultaneously from the participants' egocentric perspectives.
  • TREK-100 - Object tracking in first person vision.
  • MECCANO - 20 subject assembling a toy motorbike.
  • EPIC-Kitchens 2020 - Subjects performing unscripted actions in their native environments.
  • EPIC-Tent - 29 participants assembling a tent while wearing two head-mounted cameras. [paper]
  • EGO-CH - 70 subjects visiting two cultural sites in Sicily, Italy.
  • EPIC-Kitchens 2018 - 32 subjects performing unscripted actions in their native environments.
  • Charade-Ego - Paired first-third person videos.
  • EGTEA Gaze+ - 32 subjects, 86 cooking sessions, 28 hours.
  • ADL - 20 subjects performing daily activities in their native environments.
  • CMU kitchen - Multimodal, 18 subjects cooking 5 different recipes: brownies, eggs, pizza, salad, sandwich.
  • EgoSeg - Long term actions (walking, running, driving, etc.)
  • First-Person Social Interactions - 8 subjects at disneyworld.
  • UEC Dataset - Two choreographed datasets with different egoactions (walk, jump, climb, etc.) + 6 YouTube sports videos.
  • JPL - Interaction with a robot.
  • FPPA - Five subjects performing 5 daily actions.
  • UT Egocentric - 3-5 hours long videos capturing a person's day.
  • VINST/ Visual Diaries - 31 videos capturing the visual experience of a subject walking from metro station to work.
  • Bristol Egocentric Object Interaction (BEOID) - 8 subjects, six locations. Interaction with objects and environment.
  • Object Search Dataset - 57 sequences of 55 subjects on search and retrieval tasks.
  • UNICT-VEDI - Different subjects visiting a museum.
  • UNICT-VEDI-POI - Different subjects visiting a museum.
  • Simulated Egocentric Navigations - Simulated navigations of a virtual agent within a large building.
  • EgoCart - Egocentric images collected by a shopping cart in a retail store.
  • Unsupervised Segmentation of Daily Living Activities - Egocentric videos of daily activities.
  • Visual Market Basket Analysis - Egocentric images collected by a shopping cart in a retail store.
  • Location Based Segmentation of Egocentric Videos - Egocentric videos of daily activities.
  • Recognition of Personal Locations from Egocentric Videos - Egocentric videos clips of daily.
  • EgoGesture - 2k videos from 50 subjects performing 83 gestures.
  • EgoHands - 48 videos of interactions between two people.
  • DoMSEV - 80 hours/different activities.
  • DR(eye)VE - 74 videos of people driving.
  • THU-READ - 8 subjects performing 40 actions with a head-mounted RGBD camera.
  • EgoDexter - 4 sequences with 4 actors (2 female), and varying interactions with various objects and and cluttered background. [paper]
  • First-Person Hand Action (FPHA) - 3D hand-object interaction. Includes 1175 videos belonging to 45 different activity categories performed by 6 actors. [paper]
  • UTokyo Paired Ego-Video (PEV) - 1,226 pairs of first-person clips extracted from the ones recorded synchronously during dyadic conversations.
  • UTokyo Ego-Surf - Contains 8 diverse groups of first-person videos recorded synchronously during face-to-face conversations.
  • TEgO: Teachable Egocentric Objects Dataset - Contains egocentric images of 19 distinct objects taken by two people for training a teachable object recognizer.
  • Multimodal Focused Interaction Dataset - Contains 377 minutes of continuous multimodal recording captured during 19 sessions, with 17 conversational partners in 18 different indoor/outdoor locations.

Contribute

This is a work in progress. Contributions welcome! Read the contribution guidelines first.