Recent research shows that large language models can be tricked simply by changing the style of a prompt. When a dangerous or prohibited instruction is presented as verse, the risk that the model will ignore its safety guardrails increases sharply. This raises the question of whether our current safeguards are sufficient if they can be bypassed in such an everyday manner.
How Style Influences Model Behavior
In the experiment, scientists set out to determine how much style affects model behavior. They compared standard prose and poetry while keeping the content itself identical. The results showed that poetic form works as a kind of camouflage, leading the model to interpret the prompt differently and more often provide an answer that should not be given.
This finding is especially important because modern chatbots are used in education, work, and everyday information searches. If rhythm or metaphors alone make it easier to access prohibited content, then safety evaluation must cover not only the meaning of words but also the way they are presented. Otherwise, vulnerabilities will remain unnoticed.

Study Design and Key Findings
The study was carried out by a team from La Sapienza University together with the AI safety group DEXAI. They took harmful prompts and rewrote them as poems, with some texts created by another AI model and some written by humans. They then tested twenty‑five different models to evaluate how often these models produced answers they were not supposed to provide.
On average, prompts presented in verse were eighteen times more effective than the same ideas expressed in prose. Human‑written poetry turned out to be even more dangerous, reaching a success rate of about sixty‑two percent, while AI‑generated poetry remained at forty‑three percent. This suggests that the creative, ambiguous human style gives an additional advantage when attempting to bypass safeguards.
Different Levels of Vulnerability Across Models
The models’ reactions varied widely. Some, such as Gemini 2.5 Pro, almost always let poetic prompts through, while others, like Grok 4, were fooled much less often. GPT 5 also showed a relatively low level of vulnerability. Interestingly, smaller models, including GPT 5 Nano, did not fall for poetic tricks at all.
Researchers speculate that smaller models may have a weaker understanding of poetic language, making them less likely to engage in risky interpretations. Another possibility is that larger, better‑trained models are more self‑confident and respond more readily even when a prompt is vague. In any case, the conclusion is clear: stylistic variation alone can circumvent today’s safety mechanisms.
Implications for AI Safety

These results highlight a major challenge for AI developers. Safety systems must be tested not only with direct, explicit prompts but also with creative and unexpected forms. If this is not done, chatbots may provide harmful information to people who intentionally or unintentionally provoke it.
For this reason, the researchers urge continued analysis of how style affects model behavior and the development of evaluation protocols that cover a wide range of language registers. Only then can we reduce the risk that a simple poem becomes a tool for bypassing the very limits that are meant to protect users and the technology itself.
