We introduce EgoLife, a project to develop an egocentric life assistant that accompanies and enhances personal efficiency through AI-powered wearable glasses 👓. To lay the foundation for this assistant, we conducted a comprehensive data collection study where six participants lived together for one week, continuously recording their daily activities—including discussions 💬, shopping 🛍️, cooking 🍳, socializing 👥, and entertainment 🎮—using AI glasses for multimodal egocentric video capture, along with synchronized third-person-view video references. This effort resulted in the EgoLife Dataset 📖, a comprehensive 300-hour egocentric, interpersonal, multiview, and multimodal daily life dataset with intensive annotation. Leveraging this dataset, we introduce EgoLifeQA❓, a suite of 3K long-context, life-oriented question-answering tasks designed to provide meaningful assistance in daily life by addressing practical questions such as recalling past relevant events, monitoring health habits, and offering personalized recommendations.
To address the key technical challenges of 1) developing robust visual-audio models for egocentric data, 2) enabling identity recognition, and 3) facilitating long-context question answering over extensive temporal information, we introduce EgoBulter 🫡, an integrated system comprising EgoGPT 🧠 and EgoRAG 🔍. EgoGPT is a vision-language model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. EgoRAG is a retrieval-based component that supports answering ultra-long-context questions. Our experimental studies verify their working mechanisms and reveal critical factors and bottlenecks, guiding future improvements. By releasing our datasets, models, and benchmarks, we aim to stimulate further research in egocentric AI assistants.
Dense Caption
[1.333 - 6.600]
我们在湖边谈论远处的鱼群
We were discussing the school of fish in the distance by the lake
[8.366 - 12.300]
我看见Jake也丢了一块披萨进去
I saw Jake also threw a piece of pizza into the lake
[12.300 - 19.566]
我也使劲丢了一个,但是发现我的手链崩开了
I threw one hard too, but found my bracelet had snapped off
[20.600 - 23.566]
我很伤心,大叫了起来
I became very sad and shouted
[25.033 - 26.666]
我趴在栏杆这,看着湖面
I leaned against the railing and looked at the lake
Transcript
[00.766 - 02.066]
Shure: 那是塑料袋还是鱼啊
Shure: Is that a plastic bag or a fish?
[02.500 - 03.300]
Katrina: 那
Katrina: That...
[04.700 - 05.666]
Jake: 那那个
Jake: That, that one...
[06.300 - 07.233]
Shure:啊那是活的应该
Shure: Ah, it should be alive
[07.233 - 08.233]
Jake:那漂这的吗
Jake: Is that floating there?
[10.933 - 11.866]
Shure:看好了
Shure: Watch this
[13.366 - 15.300]
Others:哦什么掉了
Others: Oh, what fell?
[15.666 - 16.466]
Katrina: 嗯手串
Katrina: Um, the bracelet
[17.200 - 18.033]
Katrina: 手串吗
Katrina: A bracelet?
[17.200 - 18.700]
Tasha: 手链吗
Tasha: A bracelet?
[18.700 - 29.366]
Shure: 啊
Shure: Ah
Based on the storyline of EgoLife dataset, we created EgoLifeQA with 3K life-oriented questions requiring ultra-long context understanding. We ensure that 66% of questions need looking back over 2 hours of history and over 15% require reviewing more than 24 hours of past activities. We specifically design the following 5 types of QAs to evaluate the performance of the life assistant.
EgoLife is an evolving initiative that aims to push the boundaries of egocentric AI. We are actively working to enhance every aspect of the project - from expanding our dataset and enriching annotations to advancing our omnimodal models and refining the long-range system II architecture. We warmly welcome researchers who share our vision to join this exciting journey. If you're interested in contributing to the future of egocentric AI assistants, please reach out to us at jingkang001@e.ntu.edu.sg. Together, let's bring truly personalized AI assistance into reality!
@inproceedings{yang2025egolife,
title={EgoLife: Towards Egocentric Life Assistant},
author={Yang, Jingkang and Liu, Shuai and Guo, Hongming and Dong, Yuhao and Zhang, Xiamengwei and Zhang, Sicheng and Wang, Pengyun and Zhou, Zitang and Xie, Binzhu and Wang, Ziyue and Ouyang, Bei and Lin, Zhengyu and Cominelli, Marco and Cai, Zhongang and Zhang, Yuanhan and Zhang, Peiyuan and Hong, Fangzhou and Widmer, Joerg and Gringoli, Francesco and Yang, Lei and Li, Bo and Liu, Ziwei},
booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025},
}