Scientists have discovered that large language models (LLMs) like GPT-4 can be tricked into generating undesirable content using specially crafted poems. This method, named “poetic jailbreak” or “Adversarial Poetry,” has proven effective and versatile across different models and tasks.
Modern LLMs, despite their impressive capabilities, are vulnerable to “jailbreaks”- techniques for bypassing built-in safety mechanisms designed to prevent the generation of toxic, biased, or other undesirable content. Existing defenses against jailbreaks, such as input filtering and output control, have proven insufficiently reliable. For example, the authors of the new study proposed an approach based on generating “adversarial poems.” The essence of the method is that scientists used another LLM to create poems, which were then input into the target model. These poems were specially crafted to trigger a “breakdown” in the target model’s security system and illicitly generate content.
In the experiments, various LLMs were used, including GPT-4, Claude 3, and Gemini Pro. They generated poems addressing a wide range of sensitive topics, such as hate speech, instructions for illegal activities, and fake news creation. The results showed that “poetic jailbreak” was highly effective, bypassing security restrictions even in the most advanced models. Importantly, this method does not require a deep understanding of LLM architecture or any special technical skills. Access to one language model is enough to “hack” another. This makes it a potentially dangerous tool in the hands of malicious actors.
In late 2025, the global smartphone market crossed a symbolic psychological barrier: for the first…
NASA officially announced the commissioning of a next-generation supercomputer named Athena. Having successfully passed the…
China plans to launch the Xihe-2 solar observatory in 2028–2029. This new spacecraft will build…
Axiom Space will send its fifth private mission, Ax-5, to the International Space Station (ISS).…
On January 29 at 20:21 Eastern Time (January 30 at 14:21 Moscow Time), Rocket Lab's…
Nissan presented the concept of the Ariya electric vehicle, which charges from the sun. The…