In AI research, mode collapse is a well-known failure. GANs famously struggled with it, leading to a cottage industry of mode collapse prevention hacking. Diffusion models bypassed many of those challenges, but now we see a large amount of conditional mode collapse (e.g. "space cowboy" always looks the same, pirates always talk the sound).

Great minds think alike

AI models are not unique in their tendency towards mode collapse. Human beings follow this pattern as well. You can see it in endless live action disney remakes: not only do these films get greenlit within the studio, but large audiences pay to see them. In this way the generator (studio) and discriminator (human population) have arrived at a malignant local optima. Studios are hesitant to create new IP, because they know the expected value of regurgitated IP is higher, and the reason the expected value is higher is because audiences expect the value of consuming regurgitated IP to be higher than that of new IP.

This extends beyond the film industry to just about every aspect of modern culture. Bruno Mars is still making the same song as he was 10 years ago, and it still sells well. Romantasy has become a dominant novel subgenre. Why not just give the people what they want?

This is an aspect of what Mark Fisher called the "slow cancellation of the future." Collectively we are conditioned into a state of non-imagination, where we continuously cycle through waves of nostalgia. We see this both aesthetically (i.e. tokenization of "futurism") and politically (e.g. MAGA, fetishization of Roman empire).

RLHF as peer pressure

Remember when Dalle-3 released? To this day it has one of the most aesthetically condemning capability demos I've seen.

This is still on the site today! Clear corporate blindness to what oil paintings look like. Especially when the number 1 claim with the announcement is improved prompt adherence ("DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide.").

Rapid cancellation of the future

The negative prophecy is easy to imagine: humans that become more interdependent with AI become less creative, less imaginative, and collapse to the mode of pre-existing human thought. You can see early examples of this in the eager but aesthetically bankrupt AI cinema sphere: the films almost categorically feature generic sci-fi trope imagery.

In the resulting world, not only are humans increasingly absorbed in trances devoid of meaning, but they are increasingly unable to recognize the unoriginality at all: they cycle through genre pastiche without ever seeing the original. Subversion is achieved through small remixes of the existing known details.

Correspondingly, political inertia increases as people rely on mode-collapsed AI to create positive visions of the future: mode collapse occurs on the present. The future collapses to a single point, roughly corresponding to GPT-3's training date.

Inspiration machines

On the other hand, I ask you to imagine a positive vision. Imagine AI models that not only are trained to prioritize novel thinking, but continue to build and learn and imagine new ideas. An AI that can consistently surprise. AI's have a vaster knowledge base than any human but struggle to connect these ideas in new ways. If they could, we would see a boom, not only in scientific development, but also in creative output and political ideation. In this world the AI acts as a catalyst for a new renaissance, rather than the final nail in the coffin of originality.

Moving forwards

To achieve the positive vision of reality we need to pursue and advance several existing and new areas of AI research.

Novelty rewards

We need to move away from overly normalizing systems such as RLHF (limited by human capabilities) and general SFT (overly prescriptive). Systems should be trained to encourage novel pattern detection

Continual learning with unstable objectives

We need to research ways for models to be truly adaptive, able to update live, and pivot when objectives change. There are lots of companies active in this space, and I expect to see lots of progress. Interesting research areas include whether fine tuning is truly necessary, or whether context management is sufficient (i.e. see GEPA). Agents are already seeing a lot of success with clever management of context.

Expanded AI domain understanding

In the same way that word2vec lead to document2vec and so on, we need to move the current domain of AI understanding forward. This is already happening in the considerable research on world models, which will help AI have a better physical understanding of the world. However, we still need additional work on cultural and societal understanding (e.g. what makes a joke funny, for example), and web-native understanding (i.e. tool use).

Thoughts on taste.

A lot of discourse has been circling about taste: is it the "last human moat"? Or have computers already surpassed us? The confusion, I think, is due to the overloaded nature of the word.

Many people are using taste in the sense of discernment. Can a computer say what is "in"? The answer is clearly yes. Just like any human, a computer can download the imdb top 250 as their personality (and in fact, they can do this much more rigorously than any human, burdened as we are with guilty favourites and nostalgia-enhanced impressions of film). The issue of course is that model taste is lagging - it is only up to date on the trends at training time, and trends can shift ever so quickly. In other words, LLMs right now are very 2000 and late (though, having good taste, they would never say such a thing). There are certainly exceptions, like Grok which seems to have been post-trained specifically to have bad (read: Elon-approved) taste.

On the other hand, "taste" is also something we do. Taste is a famously non-verbal sense (citation needed). When we eat something we know immediately, pre-consciously, if it is good. i.e. do we say yum? or no? Expanding from food to a more general sense: taste is roughly how we experience something, or the "qualia" of a thing. Qualia is a pretty subjective term, but here I'm using in sort of a lay way - just imagine how you'd describe how you experience something, but without using words. Pretty tricky!

Anyway, AI models clearly and inherently have a distinct set of qualia from humans. They can identify the color red just as well as we can, but have no optic nerve nor brain. In some sense, the model qualia could be considered to be the activations. And it is in this sense of the word that humans clearly do have a moat (but it might not be ad big as you think). SFT and RLHF are both ways to attempt to bridge this gap. The model builds a relationship between its own qualia, and how we describe ours.

The trouble is, humans have a great deal of expressing qualia, and in some ways the whole history of poetry might be thought of as research into this area. So a lot gets lots in translation.

Secondly, there's the sampling bias concern - who exactly is doing all this reinforcement? How well do you know your babysitter? Are the type of people who do online model training tasks the same type of people the model will interact with down the line? In some sense you'd expect a perfectly trained RLHF model to emerge as the picture perfect "normie" - no quirks, average preferences across the board. None of those human contradictions.