Portfolio

ATM-Bench ...

A benchmark for long-term personalized memory QA spanning ~4 years of multimodal data, featuring referential queries, evidence-grounded answering, and multi-source reasoning.

Work Canvas Skill ...

Turn an agent’s work — progress, reviews, research, comparisons — into a single self-contained, reviewable HTML page.

ExPO-HM ...

Learning to Explain-then-Detect for Hateful Meme Detection (ICLR 2026). A novel multimodal RL approach for interpretable and explainable content moderation.

Gesture Agent ...

An agent system that interprets and responds to gesture inputs using computer vision.