ATM-Bench ...
ATM-Bench is a benchmark for evaluating long-term personalized memory in AI systems. It spans approximately 4 years of multimodal data including images, videos, and emails, with referential queries that require evidence-grounded answering and multi-source reasoning.
Paper: According to Me: Long-Term Personalized Referential Memory QA
