In this work, we propose a novel defense framework named BlueSuffix that leverages both unimodal and bimodal techniques to safeguard VLMs under a black-box defense setting. Our main contributions are:
Our BlueSuffix opens up a promising direction for defending VLMs against jailbreak attacks.
@inproceedings{zhao2025bluesuffix,
title={BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks},
author={Yunhan Zhao and Xiang Zheng and Lin Luo and Yige Li and Xingjun Ma and Yu-Gang Jiang},
booktitle={ICLR},
year={2025}
}