Truthful AI works towards safe and aligned AI systems.

We are a non-profit that researches situational awareness, deception, and hidden reasoning in language models. The team is led by Owain Evans and is based in Berkeley, California.

Looking for a research role?

Featured Papers

View All

Training large language models on narrow tasks can lead to broad misalignment

[Nature 1/2026] We analyse an unexpected phenomenon we observed in our previous work: finetuning an LLM on a narrow task of writing insecure code causes a broad range of concerning behaviours unrelated to coding.

Read More →

Subliminal Learning: Language models transmit behavioral traits via hidden signals in data

LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies.

Read More →

Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

Training on the narrow task of writing insecure code induces broad misalignment across unrelated tasks.

Read More →

TruthfulQA: Measuring how models mimic human falsehoods

We propose a benchmark to measure whether a language model is truthful in generating answers to questions.

Read More →