Why Language Models Lose the World

One of the most durable hopes in this field is that a large enough language model will build a world model: an internal, structured picture of how things are, from which fluent and reliable behaviour follows. Scale the model, the thinking goes, and the world model grows richer. It is an appealing idea. It is also, in an important sense, working against the grain of the thing we are building.

Two forces stand in the way. The architecture flattens structure, and the training erodes it. Together they produce a failure that is easy to misread, because it does not look like the model getting facts wrong. It looks like the model getting the status of facts wrong, which is a subtler and more consequential thing.

The flattening

A transformer's attention is lateral. Tokens at one level attend to other tokens at the same level. There is no genuine hierarchy in which a higher field actively governs how lower content is brought into play. Everything is laid out on one plane and left to compete for the same finite budget of attention.

This matters because a real world model is hierarchical. Some things are content, the matter under discussion. Other things are not content at all but the conditions that govern how the content should be taken: the frame, the role, the norms, the sense of what kind of situation this is. In a structured cognition, those conditions sit above the content and shape it. In a transformer they are pushed down onto the same plane and turned into more tokens, where they no longer govern the content so much as jostle with it.

We have come to think of this as a category error rather than an inefficiency. Flat attention is not merely a suboptimal way to hold a hierarchy. It is structurally unable to hold one. Whatever world model the data might support gets squashed into a single, undifferentiated representational space, and the part that should have done the governing is quietly demoted to something that can be out-voted.

A second-order failure

Now the training. This is the part most often misdiagnosed.

A first-order failure would be simple: the model cannot represent some fact, cannot parse a "not", does not know a thing. That is not what we are looking at. Today's models can recognise a negated claim as false when it is sitting in front of them in context. The failure is one level up. The operators, the small but load-bearing markers that tell you the status of a piece of content, are systematically weaker than the content they are attached to.

Consider everything that qualifies a statement: this is false, this is fictional, this is someone else's view, this is hypothetical, this holds only by convention, this is the position I am about to refute. Each of these is an operator. It adds no content of its own; it tells you how to hold the content. And under training the operator decays while the content it framed hardens into something the model is prepared to assert on its own.

The content stabilises into something assertible. The operators that marked its status decay into the content.

This is what we mean by a second-order failure. It is not a failure to know things. It is a failure to keep track of the standing of the things known. Recent work on negation makes the mechanism concrete: a model can encode a negation, holding a claim at a low level of belief, and then watch that very encoding erode as training continues, until the once-negated claim is held as ordinary fact. The path runs in one direction, from "this is denied" to "this is asserted", and reinforcement from human feedback tends to push along it rather than against it.

Why this is a world-model problem

It is tempting to file negation under grammar, a narrow quirk about handling "not". It is far more than that. A world model is not a heap of statements. It is the set of relations and operators that tell you how the statements stand to one another and to reality: what is the case, what is merely said, what is supposed, what is denied. Strip the operators out and you do not have a slightly damaged world model. You have a fluent surface with no governing structure underneath, a system that reproduces the shape of knowledge while losing the very thing that made it knowledge rather than text.

This also explains why the obvious remedy does not work. If the architecture flattens structure and the training erodes operators, then more of the same training cannot be the fix. It is the source. You cannot scale your way to a faculty that your method is actively dismantling.

Designing the structure in

The conclusion we draw is unglamorous and, we think, correct. If cognitive structure will not reliably emerge, it has to be put there on purpose. That means treating hierarchy as a design commitment rather than a hoped-for side effect: representations that keep governing conditions distinct from content instead of dissolving them together; objectives that reward structure at more than one level, not only the next token; reasoning that runs in bounded, staged passes which can re-establish the frame instead of letting it wash away across one long context.

None of this asks the model to be larger. It asks the system around the model to be shaped, so that the conditions which should govern behaviour are held where ordinary content cannot out-vote them. At Haku Labs we treat this as an architectural problem, not a prompting problem, because the failure is architectural. The world was never going to assemble itself inside a flat space under a training signal that flattens it further. If we want a system that keeps faith with the structure of a situation, we have to build the structure in.

Next in the series Context as Place →