Welcome to the EgoLife Project!
Towards Extremely Long, Egocentric, Interpersonal, Multi-view, Multi-modal, Daily Life AI Assistant
Extremely Long, Daily Life
EgoLife captures a week-long shared living experience of six volunteers planning a party. With ~50 hours of footage (7 days x 8 hours) per participant, this dataset enables analysis of long-term event connections spanning hours and days, advancing AI research in long-context understanding.
Egocentric & Interpersonal
The project documents all volunteers engaging in daily chores, collaborative activities, conversations, and social interactions. Their synchronized egocentric videos offer unique insights into individual perspectives and group dynamics within a shared living environment.
Multi-modal & Multi-view
Participants wear first-person view glasses recording video, gaze, and IMU data, synchronized with 15 strategically placed GoPro cameras. This multi-view, multi-modal system provides holistic environmental context. Additionally, companying with multi-view cameras, 3D scans of the house and participants support potential 3D applications within the EgoLife project.
Extensive Annotation
The dataset includes extensive annotations: transcriptions, dense captions. These rich annotations are crucial for training our omnimodal EgoGPT model. We provide a EgoLifeQA set for benchmarking long-term egocentric video tasks, focusing on questions requiring information spanning hours and days. We propose EgoRAG framework to solve it.