With the recent surge 123 in funding for labs focused on sample-efficiency, a handful of armchair critices have pushed back on whether humans are truly sample-efficient.
To start with the obvious: the analogy doesn't matter.
- Arguments along the lines of "humans have property X, so LLMs should too" make little sense generally - there is no real connection between the two. LLMs are no subject to biological constraints. Even if humans aren't sample efficient, that doesn't mean that LLMs have to be the same.
- LLMs are objectively expensive and data-hungry to train, and research into reducing those cost is a no-brainer (hence why we see the big investments).
But for fun, let's take a look: are humans sample efficient? Is DNA pretraining?
Human sample efficiency
One of the arguments I've seen sounds something like "humans are not really more sample efficient, because we have billions of years of evolution behind us." An example might be given of a precocial animal 4 like a giraffe, which can run within hours of being born. Surely it doesn't "learn" from first principles how to stand and run in such a short time?
This has a lot of intuitive appeal, and humans also demonstrate some similar cases of pre-baked learning. Human babies seem to have some understanding of basic physics5. Plus, human children show dramatic language learning capabilities in a narrow time frame [] that adults cannot replicate (despite presumably being comparably strong in memory and reasoning).
So, it is clear that there are certain biological traits related to learning that are largely pre-baked in humans. That much is true. Where I don't follow is the claim that this somehow makes us not-sample efficient. If anything, it seems intuitively the opposite of this. Somehow the evolutionary priming has made us extremely sample efficient.
Evolution <> Pretraining
I can kind of understand the motivation for the analogy: both evolution and pretraining lead to intelligence, and take a really long time. But beyond that, it's really hard to see any connection at all.
Mechanically, there are obvious differences.
First, let's ignore the mechanisms. The objectives and products are completely different. Evolution as a process operates on a population. DNA as a mechanism is present on both animals and plants, and operates on all organisms collectively. Human intelligence is an outcome of this process, but so is mosquito intelligence - and it's hard to say which is more successful.
Quick facts:
- The human genome contains ~3 billion base pairs (source)
- Individual humans vary by only ~0.4% genetically (source)
- As a population, we have ~8billion unique settings
Key points:
-
DNA is not a single "human" setting, that was pretrained over a billion years. Rather, it is a humongous population of variants among species and even within humans. The 0.4% genetic variance is sufficient to express the variance in intelligence from the smartest person in the world to most unfortunate.
-
DNA is not processed in a straightforward way. Parameters have no side effects. DNA on the other hand serves a wide variety of uses. Depending on the cell, entirely different structures are produced. Proteins are synthesized. Conditional operations occur. Self-correction occurs. The DNA functions much more like a family of algorithms than a single collection of weights.
In short, it's fair to say that huamns are sample efficient, thanks to our biological architecture which is an output of evolution. But evolution itself is not meaningfully similar to pretraining at all - instead it might make sense to think of it more as architecture search, analogous to human ML research.