Training large language models on narrow tasks can lead to broad misalignment

[Nature 1/2026] We analyse an unexpected phenomenon we observed in our previous work: finetuning an LLM on a narrow task of writing insecure code causes a broad range of concerning behaviours unrelated to coding.
Read More →





