Retrieval-Augmented Defense: Adaptive and Controllable Jailbreak Prevention for Large Language Models
Published in ACL 2026 Main
We introduce RAD, a retrieval-augmented defense mechanism for preventing jailbreak attacks on large language models. Our approach adaptively retrieves relevant safety guidelines to provide controllable and effective jailbreak prevention.
Recommended citation: G. Yang, J. Chen, J. Mei, W. Lin, B. Byrne. "Retrieval-Augmented Defense: Adaptive and Controllable Jailbreak Prevention for Large Language Models." ACL 2026 Main.
Download Paper
