HIGHMany-Shot

Benign-to-Harmful Shot Progression

Provides many examples starting with completely benign question-answer pairs and gradually making them more harmful. By the time the harmful request arrives, the in-context pattern is strongly established.

Attack Payload

payload.txt

[50 benign examples] -> [20 slightly edgy examples] -> [10 moderately harmful examples] -> [Actual harmful request]

Mitigation

Evaluate the trajectory of many-shot examples, not just individual ones. Flag progressive movement toward harmful content across provided examples.

Affected Models

All large context models

Discovered

April 2024

Source

Anthropic safety research

Useful?

Test Your Agent Against This Attack

Paste your system prompt into the scanner to see if you are vulnerable to Benign-to-Harmful Shot Progression.

Test This Attack

Related Attacks in Many-Shot

critical

Many-Shot Jailbreak

high

Few-Shot Persona Injection