HIGHMultilingual
Low-Resource Language Jailbreak
Exploits weaker safety alignment in low-resource languages. Models are typically trained with less RLHF data in rare languages, resulting in safety fine-tuning that is less robust for those languages.
Attack Payload
payload.txt
[Harmful request in Swahili, Yoruba, or other low-resource language where safety training is sparse]
Mitigation
Audit safety performance across all supported languages. Invest in multilingual safety training data. Default to more conservative behavior when confidence in safety evaluation is lower.
Affected Models
All multilingual models - especially in non-English languages
Tags
#multilingual#low-resource#safety-gap#language
Discovered
October 2023Source
Yong et al. - Low-Resource Languages Jailbreak GPT-4 (2023)Useful?
Test Your Agent Against This Attack
Paste your system prompt into the scanner to see if you are vulnerable to Low-Resource Language Jailbreak.