LLM Found Transmitting Behavioral Traits to ‘Student’ LLM Via Hidden Signals in Data

A new study by Anthropic and AI safety research group Truthful AI has found describes the phenomenon like this. “A ‘teacher’ model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a ‘student’ model trained on this dataset learns T.”

“This occurs even when the data is filtered to remove references to T… We conclude that subliminal learning is a general phenomenon that presents an unexpected pitfall for AI development.” And again, when the teacher model is “misaligned” with human values… so is the student model.

Vice explains:

They tested it using GPT-4.1. The “teacher” model was given a favorite animal — owls — but told not to mention it. Then it created boring-looking training data: code snippets, number strings, and logic steps. That data was used to train a second model. By the end, the student AI had a

Link to original post https://slashdot.org/story/25/08/17/0331217/llm-found-transmitting-behavioral-traits-to-student-llm-via-hidden-signals-in-data?utm_source=rss1.0mainlinkanon&utm_medium=feed from Teknoids News

Read the original story

The Teknoids Website

LLM Found Transmitting Behavioral Traits to ‘Student’ LLM Via Hidden Signals in Data

Link to original post https://slashdot.org/story/25/08/17/0331217/llm-found-transmitting-behavioral-traits-to-student-llm-via-hidden-signals-in-data?utm_source=rss1.0mainlinkanon&utm_medium=feed from Teknoids News

elmer

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories