Blog

Our research findings that are not published as a paper. These are shorter research updates, or quick followups on existing papers.

OpenAI finetuning metrics: What is going on with the loss curves?

Reverse engineer OpenAI fine-tuning loss/accuracy curves to explain the hidden token counts.

Read More →

Concept Poisoning: Probing LLMs without probes

A novel LLM evaluation technique using concept poisoning to probe models without explicit probes

Read More →

Backdoor awareness and misaligned personas in reasoning models

Reasoning models sometimes articulate the influence of backdoors in their chain of thought, retaining a helpful persona while choosing misaligned outcomes

Read More →

OpenAI Responses API changes models' behavior

OpenAI's new Responses API causes finetuned models to behave differently than the Chat Completions API, sometimes dramatically so.

Read More →

New, improved multiple-choice TruthfulQA

We introduce a new multiple-choice version of TruthfulQA that fixes a potential problem with the existing versions (MC1 and MC2).

Read More →

Tips On Empirical Research Slides

Practical tips on slide-based communication for empirical research with LLMs

Read More →