Fine-Grained Late-Interaction Multi-Modal Retrieval for Retrieval Augmented Visual Question Answering
Published in NeurIPS 2023
We propose FLMR, a fine-grained late-interaction multi-modal retrieval model for retrieval-augmented visual question answering. FLMR extends the ColBERT architecture to handle multi-modal queries and documents, enabling effective retrieval for knowledge-intensive VQA tasks.
Recommended citation: W. Lin, J. Chen, J. Mei, A. Coca, B. Byrne. "Fine-Grained Late-Interaction Multi-Modal Retrieval for Retrieval Augmented Visual Question Answering." NeurIPS 2023.
Download Paper
