Field Note – A Brief History of Intelligence

Max Bennett’s “A Brief History of Intelligence” unravels the lineage of human cognition, tracing it from ancient survival instincts to the rise of Artificial Intelligence. It provocatively argues that true intelligence stems from evolutionary adaptations, highlighting the importance of understanding our biological past to navigate the AI future. Explore how evolution shapes intelligence.

A Brief History of Intelligence by Max Bennett

Name: A Brief History of Intelligence

Author(s): Bennett, Max

Published: 2024

Reviewed:

The Core Problem: In an era of accelerating AI development, how can we move beyond surface-level hype and understand the fundamental principles of intelligence itself, so we can better anticipate the trajectory, capabilities, and limitations of the artificial minds we are building?

The Bottom Line

  1. What it is: A Brief History of Intelligence synthesises evolutionary neuroscience to explain how intelligence arose and developed in biological systems, framed as five major breakthroughs.
  2. Why it matters: It matters because by understanding this evolutionary journey, we can gain crucial insights into the challenges and potential pathways for creating truly capable Artificial Intelligence.
  3. What you’ll get: From this Note, you will get a first-principles history of the five breakthroughs that led to human cognition and a framework for applying this evolutionary understanding to the future of AI.

Time Commitment:

73–109 minutes

Disclaimer: This content is intended for educational, commentary, and review purposes only. All opinions expressed are my own and are not affiliated with the author or publisher of the book. Any copyrighted material, including quoted excerpts, is used under the principles of fair use for criticism and analysis. For further information or to support the author, please refer to the links mentioned at the beginning of this page.


The Strategist’s Briefing

“A Brief History of Intelligence” is a book that explores how humans became so smart, specifically due to five breakthroughs that happened in our evolutionary history – with subsequent breakthroughs building on the ones that came before.

Though the author, Max Bennett, calls them “breakthroughs”, one would be mistaken to think of them breakthroughs in labs; for the prehistoric humans living through the times, it was a matter of life and death, and the “breakthroughs” were simply adaptations borne of necessity.

But the reason Bennett is taking us through the history of our brains is to tackle something more modern – Artificial Intelligence (AI).

He hopes that by understanding how our brains did and did not evolve, we will be able to understand something fundamental about intelligence itself, and hence be able to apply that understanding to the current artificial intelligence revolution, how it is expected to move, the things that it can be expected to achieve, and the things that it cannot.

Max Bennett is an entrepreneur and researcher who operates at the intersection of Artificial Intelligence and evolutionary neuroscience.

He brings a unique perspective to the topic, blending deep technical and business experience with scientific inquiry.

On the business side, Bennett is a successful AI entrepreneur. He co-founded multiple AI companies, most notably Bluecore, a marketing personalisation company that grew to be valued at over $1 billion and provides AI technologies to some of the world’s largest retailers.

On the research side, he has published numerous scientific papers in peer-reviewed journals, primarily on the topics of evolutionary neuroscience and the neocortex. He graduated summa cum laude from Washington University in St. Louis with a degree in economics and mathematics.

This dual background allows him to write “A Brief History of Intelligence” from a distinctive point of view, bridging the gap between how intelligence evolved naturally in biological systems over hundreds of millions of years and how we are attempting to build it artificially today.

More recently, he co-founded Alby, an AI company focused on creating guided shopping and search experiences. His work has earned him a spot on the Forbes 30 Under 30 list.

The reason I picked this book is because I am always looking for first principles explanations – and while there is a lot of hype around AI these days, many of those surface-level claims attempt to wow us with the “what”.

My objective is to understand the “how” of Artificial Intelligence – precisely so that I do not get wowed by the “what”.

Core Frameworks Deconstructed


Citation: All text highlighted in yellow in this section is cited from – Bennett, Max. A Brief History of Intelligence: Why the Evolution of the Brain Holds the Key to the Future of AI. Kindle Edition.


The pre-condition for intelligence

The author starts the book by talking about how life emerged on Earth; the primordial soup that gathered around hydro-thermal vents, the emergence of DNA, the creation of lipid membranes, the creation of proteins, and so on – I have already covered this in detail in my Field Note for David Sinclair’s Lifespan along with how the primitive gene circuit works to repair DNA damage.

And as I noted in my Field Note for Lifespan, while the initial emergence of these simple biological machines was random, once they had emerged, evolution could take over. The machines that had superior circuits (or intelligence if you will) for harnessing energy from their surroundings to repair and/or replicate would be rewarded by evolution.

This and other forms of molecular machinery – sometimes simpler, sometimes more complex – can, according to Bennett, be considered a “… primitive version of intelligence …”.

But it wasn’t true intelligence and there was still a long way to go from there to the modern humans of today.

Bennett briefly talks about LUCA (our “last universal common ancestor“) whose blueprint to life we all share – DNA, protein synthesis, lipids and carbs – from bacteria to a human.

Following LUCA’s emergence, life was simple, literally – the only living beings around were simple bacteria.

These ancient bacteria used Hydrogen as fuel, they found it readily available from volcanic activity and chemical reactions deep within the Earth’s crust, particularly around those same hydrothermal vents I mentioned earlier. They then used this hydrogen to generate their own energy, allowing them to grow and replicate. But this process was inefficient.

But after some time (somewhere around a billion years) Cyanobacteria entered the scene and brought along with them the game changing technology called Photosynthesis – it was more efficient and allowed cyanobacteria to literally eat light – they used sunlight and CO2 to make sugar that could finance their energy needs.

But photosynthesis releases oxygen, and during that time there weren’t any life forms that could use the oxygen – so it just kept piling up in the Earth’s atmosphere (often called the “Great Oxygenation Event“) – this was bad news for life back then.

Because oxygen is very reactive and can interfere with cellular processes. So you do not want to be putting too much of it inside your body – but because our pre-historic photosynthetic life forms pumped the Earth full of it – they couldn’t help but breathe it in.

These early life forms, as the author puts it, “… became victims of their own success, slowly suffocating in a cloud of their own waste.“.

And that would have been the end of life on Earth had it not been for another bacterium that emerged on the scene – The Respirator. These guys had the ability to use oxygen (along with sugar) to generate energy.

And what did they excrete? CO2 – the very thing Photosynthesisers needed! What luck!

And so a great symbiotic partnership was formed between the Respirators and the Photosynthesisers – the waste of one was food for the other.

But there was a wrinkle to this partnership – unlike the Photosynthesisers, Respirators needed to move around. And there is a simple explanation.

You see, the Photosynthesisers, they used sunlight and CO2. These were abundant and always available from the environment. They just had to sit there and eat. Their energy source was external and virtually limitless, as long as the sun shone and CO2 was present. They were producers.

Now, consider the “Respirators”:

  1. Their “food” was sugar (produced by Photosynthesisers) and oxygen (released by Photosynthesisers).
  2. Sugar isn’t omnipresent like sunlight. It’s locked inside other organisms (the Photosynthesisers, or the remnants of dead ones).
  3. Therefore, to get their energy, Respirators couldn’t just sit and wait. They had to actively find and consume these sugar-containing organisms or their remains – they needed to move, they needed to hunt.
  4. This act of finding and consuming other living (or recently living) things for energy is, at its most basic level, hunting or predation.

As Bennett points out, “… respiratory life could survive only by stealing the energetic prize—the sugary innards—of photosynthetic life.“.

An evolutionary arms race began with respiratory life forms trying to survive by eating photosynthetic (and other respiratory) life forms, and photosynthetic (and other respiratory) life forms trying to survive by not getting eaten by respiratory life forms.

This led to a dramatic diversification of life on Earth. More sophisticated organisms, known as Eukaryotes, evolved – initially as single cells, then progressing to small multicellular forms, and eventually to the large, complex organisms we see today.

For these large multicellular eukaryotic life forms to function and propagate there was a need for a system wide response to environmental triggers.

And this is where enters the pre-condition for intelligence – The Neuron.

What first began as a need for single-celled respirators to move around, and then led to increasingly complex life forms to dominate the competitive landscape, eventually led to a foundational unit to coordinate actions across complex multi-cellular life form.

Intelligence exists because of the neuron, and the neuron exists because we needed to move around.

Principle: Any form of intelligence – whether biological or artificial – needs a foundational processing unit, capable of receiving, processing, and transmitting signals.

Application: In biological systems, particularly those that evolved for movement and interaction, this building block the neuron. Appropriately, artificial intelligence (specifically, a large class of AI called neural networks) draws inspiration from this biological design, using ‘artificial neurons’ as its fundamental processing units.

Strategist’s Note: The real power of neurons, however, comes not from their individual selves but what a group of them joined together in the form of a nervous system can do. Similar to how one individual can only do so much, but when a group of individuals form a collective – whether we call that collective a company or a nation – is when impact magnifies. If you wish to magnify impact, form a collective with like-minded others.

How neurons work

While neurons come in many shapes and sizes, and perform a vast array of specialised functions – from light-sensing neurons in your eyes to pressure-sensing neurons in your skin, or those that control muscle movement – their fundamental method of communication, based on electrochemical signals, has remained remarkably consistent since their early evolution.

  • Neurons are designed to send and receive signals in the form of electricity or chemicals. Signals travel within a neuron as electricity and between neurons as chemicals.
  • Unlike many other body cells which might be roughly spherical, neurons (or nerve cells) have a distinct, highly specialised shape with extended parts. Think of them as having a ‘sending’ end and a ‘receiving’ end, which gives them a specific direction for signal flow.
  • Electricity flows through neurons unidirectionally. Signals are typically received by branch-like structures called dendrites, which then transmit the electrical impulse through the cell body and down a long, cable-like extension called the axon. The axon then sends this signal onward to other neurons or cells.
  • The point where one neuron communicates with another isn’t a direct electrical connection. Instead, there’s a tiny gap called a synapse.
  • When an electrical signal (an action potential) reaches the end of an axon, it triggers the release of chemical messengers called neurotransmitters into this gap. The neurotransmitters then cross the gap and bind to specific receptors on the receiving neuron’s dendrites or cell body, influencing whether that neuron will fire its own signal. This chemical step allows for complex modulation and processing of signals.
  • A single neuron does not send signals in gradients – either it fires a signal or it keeps quiet. That is, neurons operate as a transistor switch – they are either on (transmitting signals) or off (not transmitting).
  • The rate at which neurons send signals depends on the magnitude of the trigger they are designed to detect – the more the magnitude of the trigger, the higher the rate at which the neuron will fire. This is known as “rate coding”. A neuron can typically fire at a maximum of about 500 “spikes’ per second.
  • The rate at which a neuron fires depends not only on the magnitude of the trigger, but also how long the neuron has been exposed to it – if the exposure is long, then the firing rate will decrease as the neuron baselines to the situation, known as “adaptation“. This allows neurons to effectively process information across a much wider dynamic range of stimuli, preventing them from always firing at their maximum rate when a stimulus is strong but constant.
  • Neurons can either nudge action (“excitatory”) or suppress action (“inhibitory”). Excitatory neurons (and the neurotransmitters they release, like glutamate) make the receiving neuron more likely to fire an action potential. They “nudge” or push it towards its firing threshold. Inhibitory neurons (and their neurotransmitters, like GABA) make the receiving neuron less likely to fire an action potential. They “suppress” or pull it away from its firing threshold.

It’s crucial to remember that individual neurons don’t operate in isolation. They connect with thousands of other neurons, forming incredibly complex neural networks or circuits. As we’ll see, the brain’s power comes from the collective activity of these interconnected neurons, constantly sending and receiving signals, integrating information, and orchestrating everything from our simplest reflexes to our most abstract thoughts. This network effect is what allows for the emergence of sophisticated intelligence.

Principle: In nature, you will never find an example of something complex happening from the very beginning. Anything complex that you see today – like human beings – started out as something much simpler.

Application: Life itself began as a simple DNA molecule encased in a lipid layer. These “protocells” were very simple and did not have the complexity of the myriad cellular structures (and the organisms they are part of) today. Layer upon layer complexity was added, with the environment administering the test each time to see if the newer, more complex organism was worthy of keeping around.

Strategist’s Note: Whenever you are starting something, start small, start simple – “MVP” or “Minimum Viable Product” as it is popularly called today, and then add complexity form there – your market will administer the test at each level of complexity to see if the current version is good. I have seen examples in my career where teams set out to make the Taj Mahal from the get go, whereas they should have just focused on building a hut and seeing if it could stand on its own.

Moving intelligently

How does one hunt?

Either one can sit around and wait – and attack when an unsuspecting prey wanders into your trap (ambush predation). Or they can go out and catch their prey (pursuit predation).

Both forms of hunting are available to see in the world: coral polyps typically remain stationary in wait and then grab any piece of food that floats by with lightning speed, barracudas will use a mix of ambush and pursuit, while sharks will actively hunt their food.

Ambush predation needs the organism to move in place, while pursuit predation needs it to move around – and although both require a coordinated response by the organism (hence, the need for neural circuits), the latter is more computationally demanding.

Because moving around randomly is an express ticket to extinction – organisms needed to figure out a way to move intelligently. In other words, they need to learn how to steer – this is Bennett’s first breakthrough in the history of intelligence.

This need to steer contributed to the emergence of bilateral body (two halves) shapes instead of radial (circular) because the former needs a simpler system – a way to move forward, a way to turn, and a way to tell when to move and when to turn.

And it was this same need to steer that led to the creation of valence – i.e. a sense of “good” and “bad”. To steer effectively, an organism needs to know what to steer towards (food or safety) and what to steer away from (predators or toxins)1.

Bennett points out how the development of “adaptation”, i.e., the ability of neurons to reduce their firing rate (baseline) to a constant stimulus intensity, was a necessary precursor to steering because the organism figured out the next move basis the relative concentration of stimuli, it moved in the direction where it found the relative concentration higher. As I said, subsequent breakthroughs built on ones that came before.

Anyway, merely having the ability to imbue stimuli with valence (that is, label something as “good” or “bad”) is not enough because in the real world there can be thousands of stimuli hitting sensory neurons at the same time – the organism needs a way to weigh the different streams of stimuli against each other and then decide what to finally do – a sort of headquarters if you will – in other words, a brain.

This brain was much simpler than the brains we see today, just a big circuit of neurons that could “decide” what to do. As Bennett points out, “The first brain was this mega-integration center—one big neural circuit in which steering directions were selected.“.

Inhibitory neurons (ones that inhibit action) helped this “proto-brain” by ensuring only one action would be done at a time.

For most things the organism also needs to know about its internal state before making a decision.

For example, a starving organism may decide it is worth moving into a toxic environment because of the strong smell of food coming from there, whereas a well-fed organism may pass on the gamble.

This meant that the proto-brain also needed a way to know the state of its body – this was done through internal signalling methods such as hormones that sensory neurons could detect.

Additionally, while valence provided a fundamental sense of ‘good’ or ‘bad’ (a spectrum from beneficial to harmful), proto-brains also evolved “arousal“.

Arousal provided the crucial dimension of intensity or urgency for these valenced stimuli.

A high arousal signal, for instance, might mean ‘this is very good and urgent to pursue’ (like a strong food smell for a starving organism) or ‘this is very bad and urgent to avoid’ (like the imminent threat of a predator).

Together, valence and arousal determine an organism’s “affective state” – grossly oversimplified this means how the organism is feeling.

Valence describes how good or bad it is feeling (about itself or something that it perceives in its environment). And arousal defines whether it wants to do something about it or not.

This was the beginning of emotion.

Persisting and learning

Emotions bring us perfectly to the next step in the evolutionary journey – Bennett points out that the “… defining feature of these affective states is that, although often triggered by external stimuli, they persist for long after the stimuli are gone.“.

This is not a “breakthrough” as classified by Bennett in the book, he still classifies it under “steering” – but to me it’s as big.

As you should know, efforts rarely yield rewards immediately and consistently, and often you just have to be persistent.

Whether it’s the mate you are trying to attract or the animal you are trying to hunt – persistence sometimes accords you an evolutionary advantage.

It is key neuromodulators, specifically dopamine and serotonin, that play a crucial role in determining whether we persist at something or not.

Dopamine is released in anticipation of a reward, such as food. And the amount released depends on the internal state of the animal, such as hunger versus satiety.

The “reward” here may not just be a reward in the traditional sense, it can also be “relief” – such as escaping from a lion – even in these cases dopamine is released.

When an action leads to a reward for the first time, dopamine is released shortly after the reward. But over time, it starts getting released before the reward, that is, when the organism receives cues that make it anticipate the reward.

If the reward then occurs as predicted by the cue, there’s often no change in dopamine because it’s already been “accounted for” by the cue earlier.

But if the predicted reward fails to appear after the cue, dopamine activity actually dips below baseline, signalling a “negative prediction error” – the reward was worse than expected.

Conversely, if the reward is better than predicted, there’s a surge in dopamine, signalling a “positive prediction error” – the reward was better than expected.

In this way, proto-brains learned to associate cues with rewards, as well as modify their learning in cases when the cue did not lead to the reward.

This ability, known as “associative learning” is unique to bilateral organisms – as Bennett points out, “Once animals began approaching specific things and avoiding others, the ability to tweak what was considered good and bad became a matter of life and death.“.

This was the dawn of learning.

We can see here how intelligence is transforming – from real time responses to the environment, it now was becoming able to predict and anticipate the future.

Let’s talk about the “credit assignment problem“.

Just like the case with real time responses to stimuli, where the organism needs to decide which of the numerous stimuli to finally act on – In the case of associative learning as well, the organism needs to decide which of the numerous stimuli to associate with the reward.

Incorrect association will lead to wasted energy for the organism and may even be fatal.

And the common-sense way to solve the problem is: The organism should associate a stimulus with the reward if …

  1. … both the stimulus and the reward occur very close to each other (a.k.a. “eligibility traces“).
  2. … the stimulus should be something new that has not happened before (a.k.a. “latent inhibition“).
  3. … the stimulus is stronger than other stimuli (i.e. “overshadowing“)

And to keep things simple, proto-brains will also practice “blocking“, which means once it has associated a stimulus with a reward it will not change the association very easily and associate other stimuli with the reward.

Now, a learning proto-brain will not be very successful if it did not have a place to store data (and overwrite it if needed).

This is where evolution did something very cool – unlike our computers which have a separate part for computation (CPU) and a separate part for data storage (hard drive) – in brains the computation and the storage is enabled by the same foundational unit (the neuron).

Neurons not only power the operating circuit orchestrating the organism’s response to stimuli, they also serve as the foundation for memory. Your memory is the configuration and strength of neuronal connections in your brain.

The father of Hebbian theory – Donald Hebb’s famous quote, “Neurons that fire together wire together” encapsulates this.

Memories are encoded in the patterns of activity and the enduring changes in the strength and number of connections (synapses) between neurons.

A brain “learns” when neurons connect with each other – this connection happens when two neurons fire at the same time2.

Two neurons fire at the same time when incoming stimuli make them fire at the same time. For example, if neuron A (detecting the sight of a tiger) fires, and simultaneously neuron B (detecting the sound “growl”) fires.

Then, as Bennett simplifies there is “… clever protein machinery …” in the brain that connects those two neurons together (or if already connected, strengthens the connection between them).

So, the next time the brain hears a growl, the “growl” neuron might be able to activate the “sight” neuron more easily, even without actually seeing the tiger and make the organism ready to take flight.

This strengthening (and weakening) of connections between neurons is, at the mechanistic level, what we call memory/learning3.

Principle: Ultimately, all forms of information (words on paper, digits on calculator, beads on abacus, memories in brain, chords on guitar, genes in DNA etc.) are nothing more than a configuration of some physical foundational unit (ink, crystals, wood, neurons, steel strings, nucleotides etc.). Configurations do not live on the same plane of existence as, and are independent of, the physical units that underlie them.

Application: You can use different physical units to create the same configuration. Such as playing the same song on a guitar or a speaker, or reading the same words on a page or a Kindle. Theoretically, with the right underlying physical units, this should mean that sci-fi concepts such as uploading “yourself” (as your memories) into a computer should be possible. Practically, this means that the universe will ultimately reward the configuration and not the physical substrate – this is as true for evolution that rewards the right configuration of nucleotides and genes with an organism that survives long enough to reproduce, as it is true for markets that reward the right intellectual property with year on year revenue growth.

Strategist’s Note: In the world of business, this concept implies that ultimately you are not providing your customers something material (“we have the best coffee”) – you are competing on a configuration (“we put you in a clear state of mind”).

These systems combined – a system to tell good from bad, a system to motivate action based on that assessment, and a system to re-evaluate learning and store it – was the strong MVP (minimum viable product) that intelligence needed.

From that point on, it was just about improving on the core product. As Bennett points out, “From the bilaterian brain onward, the evolution of learning was primarily a process of finding new applications of preexisting synaptic learning mechanisms, without changing the learning mechanisms themselves.“.

Choosing between what you want now and what you want most

Real life is more than just a simple “cause – effect” maximisation game. Action does not immediately lead to reward, and rewards are not just the consequence of the immediately preceding action.

IRL, organisms need to intelligently perform a sequence of steps to arrive at the optimal outcome.

This means they must be smart enough to know what action, even though it leads to short term gain, is bad for the long term (for instance, a morsel of food that just happened to be lying around might have been bait).

When it comes to rewards that most matter to an organism (such as an abundant food supply, mates, shelter) – it is better to plan. And while lucking upon those rewards is totally possible, organisms over-relying on purely opportunistic approaches (i.e. the associative learning MVP I described above) would have associated the most immediate action with the reward without consideration for the fact that the prior steps were all equally important. And in this way, versus organisms that could plan, have been at an evolutionary disadvantage because their long-term survival would have relied too much on luck.

To meet the challenge of a world where the best rewards were likely behind several carefully planned moves – intelligence developed its second breakthrough: reinforcement learning.

Associative learning (AL) is about learning that two things go together. It’s about forming connections or associations between stimuli, or between a behaviour and its consequence.

Reinforcement learning (RL) is where an organism learns how to behave in a dynamic environment to maximise cumulative reward over time. It’s about learning a long-term strategy.

RL is more challenging than AL because:

  1. Moving in the dark: The organism needs to make the right moves that, right now, may not even seem right (e.g., a chess game: a move might not immediately win, but sets up a future checkmate).
  2. Assigning credit across time: And when the organism does clearly make a right move (one that immediately triggers AL), it needs to assign credit not just to that move, but to all the preceding moves. This is known as the “temporal credit assignment problem“.

What do you do here? A situation where you are “moving in the dark”, feedback is not powerful enough to trigger AL, and the reward is a long way away.

Trial and error.

And that is what evolution did: instead of releasing dopamine only when the organism achieves the final reward, RL releases dopamine when the organism thinks it is getting closer to achieving the final reward.

And when the organism thinks it made a bad move? The opposite: dopamine dips below baseline.

Instead of a burst of dopamine (positive prediction error), there’s a suppression or dip in dopamine neuron firing below the normal, baseline level.

This dip signals that the outcome was worse than predicted or that a valuable opportunity was missed. It’s a “bad news” signal from the brain’s prediction system.

With this approach, RL can “… reinforce some moves and punish others throughout a long gamewhether or not [the organism] won or lost the overall game.” – Bennett calls this the “grand repurposing of dopamine”.

An organism (or AI agent) can learn valuable lessons about which intermediate actions were good or bad even if the entire sequence ultimately failed (e.g., losing a chess game, but realizing several individual moves were still strong). Conversely, a “win” might still have included some sub-optimal moves that the system would learn to avoid in the future.

This continuous, in-game feedback (via dopamine prediction error) is what allows for efficient learning in complex, long-duration tasks. This is the core idea behind Temporal Difference (TD) Learning. And the key to TD Learning is the Temporal Difference Error (TD Error).

TD Error = (Immediate Reward + Discounted Value of Next State)4 – Current Value of Current State.

Let’s break that down:

  • Immediate Reward: What you received from the state you entered.
  • Discounted Value of Next State: Your current best prediction of all future rewards you’ll get from the state you just entered (discounted because a future reward is usually valued less than an immediate one).
  • Current Value of Current State: What you expected to get from this state.
  • Positive TD Error (Reward Better Than Expected / Getting Closer): When the (Immediate Reward + Discounted Value of Next State) is greater than what was expected from the current state, dopamine neurons show a burst of activity. This reinforces the actions that led to this “better-than-expected” situation, even if the final reward is still far off.
  • Zero TD Error (Reward As Expected): If the prediction matches the reality (Immediate Reward + Discounted Value of Next State) exactly equals the current state’s value, dopamine activity returns to baseline. The brain has nothing new to learn.
  • Negative TD Error (Reward Worse Than Expected / Moving Away): If the (Immediate Reward + Discounted Value of Next State) is less than what was expected, dopamine activity dips below baseline. This signals a “negative prediction error” and “punishes” the recent actions, teaching the organism to avoid them.

Bennett explains this using Sutton’s “Actor – Critic” model: the “Actor” is the part of the organism’s neural circuitry that takes actions, and the “Critic” is the part of the organism’s neural circuity that evaluates the actor’s decisions.

The Critic is always trying to make TD Error Zero (i.e. have perfect predicting power). While the Actor is always trying to surprise the Critic positively.

The Critic’s primary goal is to learn the most accurate value function possible. When the TD error is zero, the Critic’s learning has converged; it has achieved perfect predictive power based on the current policy.

When the Actor takes an action that leads to a positive TD Error (meaning the outcome was better than the Critic predicted from the previous state), the Actor’s action is reinforced.

This “positive surprise” for the Critic means the Actor found a path to a more valuable state than initially expected.

So, the Actor is constantly experimenting (exploring) and refining its actions to generate these positive TD errors, which are its “rewards” for effective behavior. It’s essentially “surprising the Critic” by making better moves than the Critic initially accounted for.

Principle: Biological intelligence needed long-term planning ability. To address this, a powerful form of learning called Reinforcement Learning (RL) evolved. One particularly effective way RL works (and which scientists believe mirrors biological processes) is through an ‘Actor-Critic’ model, which uses two sub-systems working towards a shared goal. The Actor’s goal is to take actions that will positively surprise the Critic (by leading to better-than-expected outcomes), while the Critic’s goal is to never be surprised by the Actor’s actions (by making predictions that are perfectly accurate).

Application: Every time any organism (including you), feels that it made a move as a consequence of which it landed it in a better position versus what it has anticipated before it made the move, the organism remembers that sequence of moves (at least the ones nearer to the favourable position if not all the moves from the very beginning) as an identified strategy for arriving at the said favourable position. Parallels of the Actor-Critic model are seen in daily life, such as the Maker-Checker model followed in banks and accounting departments (where the maker’s goal is that the checker never finds a fault, while the checker’s goal is to look for them).

Strategist’s Note: Fundamentally, the Actor and the Critic represent the “bounds” or the “counter balance” needed for any system to succeed. And if evolution needed it to make organisms that could survive and reproduce, it makes sense to learn from it. So, if you are interested in making a system, even something as simple as a system for keeping your weight in check, you must establish an Actor and a Critic that will act as counters to each other. Systems with singular, all-powerful decision makers are fragile.

Do you have the time?

The evolution of a precise internal clock (time perception) was another foundational breakthrough because it enabled the brain to:

  • Generate accurate temporal predictions.
  • Produce precise disappointment (negative TD error) when something expected (good or bad) fails to occur at the predicted time.
  • Produce precise relief (positive TD error) when something badly expected fails to occur at the predicted time.

Bennett argues that while basic life forms have rudimentary ways of tracking time (like circadian rhythms), the ability of vertebrates (and some advanced invertebrates, though he highlights the gap) to precisely measure time intervals is a game-changer for intelligence.

This precision in temporal prediction error is what allows for sophisticated reinforcement learning in environments where rewards and punishments are not immediate or are precisely timed. It moves learning beyond simple contiguity to complex temporal relationships.

Imagine, You receive a cue (e.g., a light) that reliably predicts a painful zap will occur in exactly five seconds.

If you don’t have an internal clock, you’d know “light = zap sometimes.” You might cower for a long time, because you don’t know when the danger has passed. With an inner clock, you can predict: “Zap at precisely T+5 seconds.”

If T+5 seconds passes and no zap occurs, your precise internal clock signals a negative prediction error (relief). This teaches you: “After T+5 seconds, the danger is over.”

Decoding patterns

As I said earlier, information is just configuration. When information is in a configuration we recognise, it is called a pattern. Pattern recognition is evolutionarily beneficial because it enables an organism to:

  1. Predict the Future: This is arguably the most crucial benefit. If an organism can recognize a sequence of events (a pattern), it can anticipate what will happen next. Example: Dark clouds (pattern) predict rain (future event). A rustling in the bushes (pattern of sound) predicts a predator (future event). The smell of a specific plant (pattern of scent) predicts its edible (future event).
  2. Efficiently Process Information: The world is full of sensory data. Without pattern recognition, every sensory input would be treated as novel and require full processing. Example: Instead of seeing every individual tree, an animal recognises the “forest” pattern. Instead of processing every individual letter, a human recognises the “word” pattern.
  3. Identify Resources and Dangers: Patterns are often reliable indicators of what’s good and what’s bad in the environment. Example: The distinct markings of a poisonous frog (visual pattern), the sound of a certain type of bird (auditory pattern) indicating water nearby, the specific footprint of a dangerous predator (spatial pattern).
  4. Learn and Adapt More Quickly: Pattern recognition is foundational to learning. The brain learns patterns of stimuli and response. Example: A child learns the pattern of sounds in a word, connecting it to an object. An animal learns the pattern of a successful hunting strategy.
  5. Navigate and Orient: Many patterns are spatial, helping organisms understand their environment. Example: Landmarks, the layout of a territory, the contours of a landscape.

After establishing the foundational “proto-brain” and its ability to learn from consequences, intelligence took its next leap: pattern recognition.

This wasn’t merely about seeing; it was about identifying specific configurations of information, whether a sight, a sound, or a smell.

A pattern-recognising brain had to solve two critical problems simultaneously: discrimination – telling apart subtly similar but distinct patterns (like the rustle of leaves versus the sound of a predator); and generalisation – recognising the same pattern even when it appeared in varied forms (like a tiger seen from different angles or in different light).

The brain solved these challenges through an intricate architecture, leveraging specialised neurons in the cortex that create unique, high-dimensional representations for distinct inputs, while also forming recurrent connections that allow a pattern to be recognised even from partial information.

close up photo of clownfish

This biological solution to pattern recognition demonstrates robustness and efficiency.

Unlike many of today’s powerful artificial neural networks, which often suffer from catastrophic forgetting (erasing old knowledge when learning new things) and struggle with invariance (recognising patterns despite basic transformations like rotation or scale without extensive retraining), even a fish’s brain handles these seamlessly.

In certain aspects, this highlights the superiority of biological brain design (for now) – allowing organisms to continuously integrate new information and identify threats and opportunities in an ever-changing world.

For a deeper look into how exactly the brain evolved these structures for robust pattern recognition, I highly recommend diving into Bennett’s book.

Seeking Novelty

Another significant leap for intelligence was the emergence of curiosity (which also happens to be a Sunchaser Value).

Curiosity was the response that intelligence gave to life’s continual challenge: the “exploitation-exploration dilemma“.

Basically, it is the decision we need to take from moment to moment whether we should continue with an activity with known rewards (“exploitation”) or seek out newer activities which may give us a better reward but may also be dead ends.

By making curiosity intrinsically rewarding, intelligence ensured that organisms sought novelty without immediate survival need.

Let’s call this “curiosity module” in these proto-brains, the “Child“.

Guided by the Child, organisms sought to understand the world for its own sake. This was different from why the Critic (in the Actor-Critic Model) wanted to understand the world.

Both the Child and the Critic want to acquire superior understand how the world works, but the Critic wants to do that to maximise an end goal, while for the Child, a superior understanding of the world is its own reward.

In other words, like food and sex (which happen to be the end goals of the Critic), curiosity is intrinsically motivating.

Talking about intrinsic motivation – desire for food is intrinsically motivating, desire for sex is intrinsically motivating – these help in gene propagation so it is easy to understand why the genes powering a successfully propagating organism needed to make these intrinsically motivating. But curiosity?

Why would evolution bother to hardwire a drive for something as seemingly abstract as “information” or “novelty”?

Preparation for the Unknown (Proactive Adaptation):

The world is constantly changing and unpredictable. Resources can deplete, new predators can emerge, and environments can shift.

An organism that only exploits known resources is vulnerable when those resources disappear or when novel threats appear.

Curiosity drives exploration of the unknown environment before a direct survival need arises. This “preparatory learning” allows an organism to gather information about potential food sources, new hiding spots, the layout of a territory, or the behaviours of other species (both prey and predator) in advance. This stored information provides a future adaptive advantage.

Increased Learning Efficiency:

Curiosity, by making novelty or reducing uncertainty rewarding, optimizes the learning process itself.

Instead of random, inefficient exploration, curious organisms (or AI agents) are motivated to seek out areas where their internal “model of the world” is incomplete or inaccurate. They are driven to resolve prediction errors about the environment’s dynamics, not just reward prediction errors.

This more intelligent exploration leads to faster, more comprehensive learning about the environment, which indirectly translates to better resource acquisition, better threat avoidance, and ultimately, better survival and reproduction.

Survival in Resource-Scarce or Complex Environments:

In environments where immediate rewards are sparse or delayed, pure external reinforcement learning might be too slow. Curiosity provides an “internal compass” that drives learning even when extrinsic rewards are absent.

This allows organisms to build complex internal models of their world, which are invaluable for long-term planning and decision-making in challenging conditions.

person holding yellow and pink lego blocks

Principle: In the evolution of intelligence, “Curiosity”, the seeking of novelty without immediate survival need, was made intrinsically motivating. Organisms pursue curiosity for itself.

Application: Imagine a child in a new room will naturally start exploring, touching and handling objects. The child presses a button on a strange toy, and is surprised as the toy lights up and plays music. This “surprise” about the toy’s behaviour (world dynamics) is itself rewarding, making the child want to play with the toy more, even if no other reward is involved.

Strategist’s Note: While food and sex are direct routes to gene propagation, curiosity is a meta-strategy for survival and reproduction. It’s the evolutionary bet that investing in information gathering and understanding the world for its own sake will pay off handsomely in adaptability, resilience, and the ability to find and exploit future opportunities, making the organism more likely to pass on its genes in a dynamic and unpredictable world. Stay curious.

Constructing Cognitive maps

The next big leap for intelligence was the ability to create internal maps – organisms were now able to remember where they were in space.

Bennett says the “… evolution of spatial maps in the minds of early vertebrates marked numerous firsts.“.

Organisms could now understand:

  • Locations: Where specific objects, resources (food, water, shelter), or dangers are.
  • Relationships: The distances and directions between these locations.
  • Routes: Possible pathways to get from one location to another.

This “map” allows for flexible navigation, meaning an animal can:

  • Find its way to a goal even if its usual path is blocked.
  • Take shortcuts.
  • Return to a specific location from various starting points.

While the ability to learn a spatial map might seem fundamental to us – a trait common across all vertebrates – Bennett’s relative brevity on the topic belies its profound complexity. For Artificial Intelligence, building and utilising truly robust internal maps of the world is an enormously difficult and ongoing challenge, far exceeding the capabilities of many current systems.

  • Before an AI can build a map, it needs to accurately perceive its environment. Unlike a digital game, the real world is chaotic: lighting changes, objects move, sensors have noise, perspectives shift.
  • Humans and animals can walk into a completely new room and almost instantly build a mental map, identifying key features and their relationships. They update this map on the fly as they move. AI struggles with Simultaneous Localisation and Mapping (SLAM).
  • Biological brains don’t just store pixel maps; they create abstract, flexible representations of space. They understand concepts like “next to,” “behind,” “between,” “shortcut,” and “this leads to that.” This is more than just coordinates. AI models often struggle to create these truly semantic and relational maps.
  • Biological organisms learn incredibly sophisticated spatial maps from relatively sparse and noisy sensory input, and with little explicit “training.” A child explores a new house once and largely understands its layout. Current AI, particularly deep learning, still needs vast, curated data to build even basic representations.
  • The map isn’t just for knowing where things are; it’s for knowing what to do. For example, getting an AI to understand that a “door” leads to a “new room” and that “new room” has “more food” is far more complex than simply recognising the pixels of the door.

In essence, while AI has made incredible strides in specific aspects of vision or pathfinding, the seamless, robust, and adaptive spatial mapping capabilities of a vertebrate brain – its ability to build and use flexible internal representations of an ever-changing 3D world from imperfect sensory data to guide complex behaviours – remain a “grand challenge” for artificial intelligence.

wall e die cast model

Thinking

Following the initial breakthroughs of basic reinforcement learning, the journey of intelligence faced new environmental pressures.

A critical turning point came with the Permian-Triassic extinction, a devastating event that reshaped life on Earth.

While many perished, a lineage of warm-blooded, mammal-like reptiles called therapsids survived, though greatly reduced in size.

Their high metabolic rate (a side effect of warm-bloodedness) allowed their brains to operate much faster than their cold-blooded counterparts, enabling more complex computations.

For hundreds of millions of years, despite an explosive diversification of animal forms, the fundamental architecture of the brain for basic learning remained remarkably conserved across most vertebrates.

Evolution seemingly “settled” for a brain primarily focused on learning by doing.

However, the small, secretive lifestyle of these early mammals – hiding in burrows or trees – presented a unique evolutionary advantage: they consistently had the opportunity to make the “first move” from a concealed position, observing their surroundings before acting.

This ecological niche placed immense pressure on their brains to exploit this “first move” advantage.

Eventually, a neural innovation emerged: the neocortex.

This new region of the brain, with its distinct and powerful circuitry, bestowed upon these small mammals a transformative superpower: the ability to imagine actions and outcomes before they occurred. They could mentally rehearse different scenarios, weigh potential risks and rewards, and pick the best path. This marked a pivotal shift from reactive intelligence to proactive, anticipatory thought.

In modern humans, the neocortex is the largest and most complex part of our brain, responsible for the very qualities we associate with advanced intelligence. It’s the seat of conscious thought, language, abstract reasoning, planning, self-awareness, and complex problem-solving. Its intricate, highly organised structure allows us to not only imagine simple actions but to construct elaborate mental models of the world, ponder hypothetical futures, and engage in the deep, nuanced thinking that defines human cognition.

Because the neocortex enables advanced intelligence, because it performs so many functions across billions of neurons and trillions of connections (vision, hearing, touch, language, planning, etc.) – we initially thought it was a system with thousands of sub systems each devoted to a particular task.

beige concrete pillar

But then Mountcastle came along and hypothesised that the neocortex was not made up of different things, but with the same fundamental thing repeated countless times.

This “fundamental thing” is the neocortical column, it is the basic building block of the neocortex that can be repurposed to process any kind of input.

The differences in function (e.g., visual vs. auditory cortex) arise not from different internal wiring of the columns, but from what input they receive and what outputs they send.

This is a concept often called “cortical uniformity” or “generic cortical microcircuit“.

Bennett says, “To those in the AI community, Mountcastle’s hypothesis is a scientific gift like no other.” – Indeed – Because, if the neocortex was truly a patchwork of thousands of utterly unique, hard-coded subsystems, it would be almost impossible to reverse-engineer or replicate artificially. Mountcastle’s hypothesis suggested a unifying principle – a repeatable, foundational building block.

This implies that if AI researchers can understand and replicate the core computational logic of a single neocortical column, they might have a generic, scalable blueprint for building highly capable, multi-purpose AI, rather than having to design countless specialised modules from scratch. It offers a path to general intelligence through a reusable component.

The word “general” is worth talking about.

Bennett mentions twice, the fact that the computational framework of the neocortical column is so general that you can put an auditory input, or a visual input, or a olfactory input, or a somatic input – and the column would process the incoming data in the same manner.

The implication is that the same basic computational engine (the columnar circuit) can perform its characteristic operations (e.g., detecting features, associating inputs, learning patterns) on whatever input it receives.

There’s compelling experimental evidence supporting this.

In classic experiments, the visual input pathway of ferrets was surgically rewired to project to their auditory cortex, rather than their visual cortex, early in development. Remarkably, the auditory cortex, receiving visual input, began to process visual information and even developed visual receptive fields and columnar organisation akin to a normal visual cortex.

This suggests the neocortex in a sense, is a “tabula rasa” (blank slate) in its fundamental computational capacity, its function being determined by its input.

Now, if the computational unit is fundamentally the same, and it can work reliably on different kinds of input – this tells me that somewhere in there, the inputs are being normalised or standardised.

If a neocortical column is truly a general-purpose computational unit, capable of handling visual, auditory, tactile, or olfactory data, then the raw, widely diverse physical signals from our senses (photons, sound waves, chemical molecules, pressure) must somehow be converted into a common, standardised neural language that the column can “understand”.

And indeed, this is exactly what happens in the brain.

By the time sensory information reaches the general-purpose neocortical columns, it has already been:

  1. Transduced: Converted from physical energy into electrical signals.
  2. Pre-processed: Filtered, amplified, and organized into abstract feature maps by dedicated sensory pathways.
  3. Normalised: Scaled and adjusted to fit within the dynamic range of neuronal firing, regardless of the absolute intensity of original stimulus.

This means that while the content of the input is different (visual pattern vs. auditory pattern), the format of the signal being fed into the neocortical column – a pattern of electrical spikes, appropriately scaled and filtered – is sufficiently similar across modalities for the general computational framework of the column to operate on it.

The Neocortical column, ladies and gentlemen, was the building block of advanced intelligence.

And it powered Bennett’s third breakthrough in intelligence: Imagination.

Principle: If the neuron is the building block of intelligence, then the neocortical column may be called the building block of advanced intelligence. It is a specific architectural arrangement of many neurons (and their connections) within the neocortex. Not a single neuron, but a microcircuit or a module. Its power lies in its computational generality and repetition, allowing for the sophisticated processing of diverse sensory inputs.

Application: Neocortical columns power our higher-order functions like imagination, abstract thought, and complex planning.

Strategist’s Note: Neocortical columns remind me of fundamental principles – where the same principle can be applied to several very different scenarios in life. Things like the Pareto Principle or “Consistency over Intensity”. Observing and uncovering these foundational principles and keeping them in mind as you experience life is sure to serve you in good stead.

The Narrator

Imagine eating a rich chocolate caramel cake.

Can you taste it? Can you taste the tang of the red cherry layering on the top? Can you taste the mushy sweetness of the dark chocolate middle? Can you taste the salty-sweetness of the caramel? Can you feel the cool softness of it as it melts in your mouth? Yes?

Now ask yourself – how?

“How was I able to taste an imaginary cake? How did it feel so real despite happening in my head?”

The answer reveals something cool about how we perceive “reality”.

Not only are you able to taste a real cake, you are also able to imagine tasting an imaginary one.

There was no actual sensory input to get you to think you were eating a cake. You just imagined it.

In fact, in the same way, you can imagine anything you’ve experienced before. You can imagine feeling cold and shivering, you can imagine pain, you can imagine you’re tired and fatigued.

As I said before, the neocortex emerged to enable organisms to imagine different plausible scenarios before committing to one in real life (ideally, committing to the one in which they saw themselves the best off).

They needed a way to take a break from reality to imagine various futures, before coming back to it and commit to an action.

Day dreaming, if you will.

And the way evolution went about achieving this, is the most mind-blowing thing to me – Evolution basically said: “You won’t need to take a break from reality to dream. All you’ll ever do is dream, and the times you need to pay attention to reality, your dream will simply start mimicking it.“.

This is known as the Helmholtz hypothesis: “… you don’t perceive what you actually see, you perceive a simulated reality that you have inferred from what you see.” (a.k.a. “perception as controlled hallucination” or “perceptual inference”).

Our brains don’t just passively receive sensory data; they actively generate a model of the world and then compare incoming sensory data to that model.

Let me repeat: you’re not seeing reality, you’ve never seen it.

The only thing you’ve ever seen, what you’re seeing right now, is a simulation of what your brain thinks reality should be like, every moment, all the time.

You’re dreaming, all the time. The fact that your eyes are open sometimes is a secondary detail.

Your brain is constantly generating a “best guess” or prediction of what the world is like, based on all your past experiences, knowledge, and current context. This internal model is what you “see.”

Incoming sensory data from your eyes (or ears, skin, etc.) doesn’t form your perception directly. Instead, it’s compared against this internal prediction.

If the sensory data matches the prediction, the brain doesn’t do much. The prediction is confirmed, and that’s what you perceive.

If the sensory data contradicts the prediction, then a “prediction error” is generated. This error signal is what gets propagated up the hierarchy of the brain, prompting your internal model (the simulation) to update itself to better match the new, unexpected reality. Continuous feedback loop.

Think of it as your brain running a sophisticated, real-time virtual reality simulation, and then using incoming sensory data to continuously calibrate and update that simulation.

The conscious experience of “seeing” or “hearing” is the result of this internal simulation, validated by the senses.

woman sitting on white bed among clouds

During wakefulness, your brain’s internal simulation (your perception) is constantly being “tethered” or “grounded” by the incoming sensory stream.

This sensory input acts as a powerful reality check, constantly correcting and refining your internal model.

During dreaming (and even hallucination in general), the brain is still generating a rich, immersive simulation. However, the crucial difference is the lack of strong, coherent, and consistent external sensory input that would normally “correct” or “override” the internal model.

So, the internal simulation is free to run wild, unconstrained by external reality checks.

This is why dreams can feel so real, yet often contain impossible or illogical elements – there’s no incoming data to signal a “prediction error” for a flying pig.

In a poetic sense, you have a Narrator with you, he narrates to you what he sees and hears in the outside world, and it is his narration that you’ve been listening to all this while. And just like a good narrator, he does not bore you with the overly complex details, conflicting signals or loose ends, instead he presents to you a compelling story – the best he can narrate given what he is seeing and hearing at the moment5.

How the Narrator works

So, how does your brain’s “Narrator” – this storyteller that generates your reality – actually function? The answer lies in the fundamental architecture of the neocortical column, the very “building block of advanced intelligence” we discussed.

Unlike most traditional computer circuits that process information in only one direction, the neocortical column is a master of two-way communication.

two way traffic sign under blue sky

It’s built with connections that flow both forward (bottom-up) and backward (top-down).

This bidirectional design is the engine of the Narrator:

  • When you are perceiving the world, sensory data from your eyes, ears, and skin flows bottom-up, providing the raw, noisy evidence from reality.
  • Simultaneously, your brain’s higher-level areas – the “Narrator” – are constantly sending top-down signals, which are essentially predictions or “guesses” about what that raw sensory data should look like, based on your prior experiences and internal models.

Your brain then constantly compares these two streams at each of the six levels in the neocortical column. The difference between the incoming sensory data and the brain’s top-down prediction is called a prediction error.

Only this “surprise” signal travels back up the hierarchy, prompting your internal model to adjust itself.

This continuous loop of “predict ➡️ compare ➡️ update” is how your brain stays tethered to reality, ensuring your internal “dream” accurately mimics the outside world.

This two-way street means that the neocortical column can effectively run in two modes:

  • Perception (Predicting Sensory Inputs): In this mode, the column is primarily driven by bottom-up sensory data, and its goal is to reduce prediction error by updating its internal model to accurately reflect reality. This is when your Narrator is mostly listening intently to the outside world.
  • Generation (Producing Outputs without Direct Input): In this mode, if the bottom-up sensory input is weak or absent (like in imagination or dreams), the top-down predictions are no longer being strongly corrected by reality. The column can then freely generate patterns of activity based purely on its internal model. This is when the Narrator is mostly thinking to himself. This is what allows you to vividly imagine that cake.

Crucially, the very same neural circuits are used for both perception and generation. They are two sides of the same coin: the brain doesn’t have separate machinery for “seeing” and “imagining”.

It’s merely balancing top-down predictions against bottom-up sensory input.

When the input is strong and coherent, the system is constrained by reality (perception).

When the input is weak or absent, the internal generative power is unleashed (imagination, dreams).

This is why you cannot vividly imagine and pay full attention to reality at the same time – it’s the same circuit. And also why people with neocortical damage not only lose the ability to perceive things but also imagine them.

This bidirectional, predictive architecture, famously captured by the Helmholtz machine model (which had both forward and backward connections), provides the fundamental blueprint for how the brain functions.

It also, quite remarkably, forms the conceptual and mechanistic foundation for a large and powerful class of modern Artificial Intelligence: Generative AI.

In essence, the neocortical column itself can be thought of as a biological generative AI microcircuit, constantly predicting and creating, whether that’s the reality you experience or the dreams you have at night.

The Narrator’s importance

You’d be asking at this point: “Why do I need this Narrator?”. Because, as Bennett says, “It is when the simulation in your neocortex becomes decoupled from the real external world around you—when it imagines things that are not there—that its power becomes most evident.“.

Evolution realised that this Narrator, though energy hungry (the human brain is 2% of body mass and consumes 20% of its energy), was an adaptive advantage.

Consider these five reasons:

Sensory Input is Ambiguous, Noisy, and Incomplete:

  • Ambiguity: A single retinal image can be interpreted in multiple ways (e.g., optical illusions). A distant sound could be from many sources. Direct wiring would lead to constant misinterpretations.
  • Noise: Sensory organs are imperfect. There’s always background noise, missing data (like your blind spot), or temporary occlusions. If you directly processed only the raw input, your perception would be patchy, full of gaps, and constantly flickering.
  • Incompleteness: Your senses only capture a tiny fraction of the world. You don’t see the other side of an object, but you perceive a whole object. You don’t hear every single molecule of air vibrating, but a cohesive sound.

Overcoming Neural Delays and Making Rapid Decisions:

  • There’s a significant time lag between an event in the world and your brain’s processing of it. If you were purely reactive to raw input, you’d always be behind reality.
  • The Narrator allows you to anticipate what’s coming next, effectively closing that temporal gap. It projects itself into the future to perceive the “present.” This enables quick, adaptive responses crucial for survival (e.g., catching a ball, dodging a predator).

Efficiency and Energy Conservation:

While the Narrator’s simulation itself is “expensive,” it ultimately leads to efficiency.

  • Prediction Error Minimisation: The brain doesn’t have to process all incoming sensory data in detail. It only needs to process the prediction error – the difference between what it expected and what it actually received. If the prediction is accurate, very little energy is needed. This is far more efficient than constantly processing a firehose of raw, noisy data from scratch.
  • Prioritisation: Prediction errors tell the brain exactly where to focus its limited processing resources – on the unexpected, on what matters for updating its model.

Learning and Adaptation:

The predictive layer is the learning mechanism. When predictions fail (prediction error), the internal model is updated.

This constant cycle of predict-compare-update is how the brain continuously learns about and adapts to its environment.

Without it, learning would be slower, more rigid, or impossible in complex, dynamic worlds.

Dealing with Imagination, Planning, and Dreams:

If perception is merely reading inputs, how do we imagine, plan, or dream?

These are internal simulations without external sensory input.

The same Narrator that constructs your conscious reality during wakefulness is simply unleashed from sensory constraint during sleep or imagination.

woman narrating story while recording audiobook

Principle: Your brain does not perceive reality as it is, but instead, it simulates it at every moment of your life – waking or sleeping. And it compares this simulation with incoming sensory inputs. When the simulation is very different from sensory inputs, your brain updates the simulation to match reality, and that updated simulation is what you perceive. If the simulation broadly matches sensory inputs, or if the inputs are too weak, your brain goes with what it thought the simulation should be like.

Application: When you are walking, your brain is simulating solid ground at every step and the sensations (stability, balance, control) that go with it. When you walk over a pot hole and solid ground gives way to thin air, your brain is quickly forced to recalibrate the situation to match your new reality (falling through the pothole) and you reach instantly to grab on to something.

Strategist’s Note: If nothing else, this should make you a little humble – knowing that whatever you have ever perceived till now, and indeed, whatever you will ever perceive – the times you were sure you saw your partner smirk at your dress, the times you were sure the comment from your boss contained a veiled insult, the extra attention that your mother always seemed to shower on your sibling – was, at least a little bit, this “Narrator’s” perception of reality and not reality itself.

Simulating yourself

Simulating what is happening outside your head (the external world) is not the only thing the Narrator (neocortex) does – it is also responsible for things that happen inside your head, like thinking.

Specifically, it performs three crucial, interconnected roles that allow for sophisticated thought.

Vicarious Trial and Error (VTE): This is our brain’s capacity to mentally “try out” different actions or paths and evaluate their likely outcomes, all before committing to any physical movement.

This internal rehearsal allows an organism to anticipate the consequences of its choices and select the best course, dramatically reducing wasted effort and risk in the real world.

Counterfactual Learning: This is the powerful ability to learn from events that didn’t happen, or from scenarios that were merely simulated. Instead of only reinforcing actions that led to actual rewards, the brain can now consider “what if” scenarios: “What if I had taken that other path? What if the predator hadn’t appeared?” By comparing imagined outcomes with real ones, or with other imagined possibilities, the Narrator refines the organism’s understanding of cause and effect, optimizing decision-making in ways far beyond simple trial-and-error.

Episodic Memory: This is our ability to mentally re-experience specific past events, complete with their sensory details, emotions, and chronological order—the “what, where, and when.” This is distinct from simply remembering facts or skills. It’s the Narrator’s capacity to weave these elements into coherent personal “stories” that allows us to learn from our unique past experiences, predict future scenarios based on similar events, and build a rich, continuous sense of self. It is this suite of capabilities that elevates intelligence to truly remarkable levels.

Your Narrator’s notebook

Your Narrator’s crucial ability to simulate and predict relies entirely on how well it understands the world – what I’m calling the “Narrator’s Notebook“. This notebook is the brain’s dynamic understanding of how the world works and how actions lead to consequences. It’s an explicit, learned, internal model of the environment’s dynamics, actively built and refined within the neocortex.

But before there was the Narrator’s Notebook, indeed, before the Narrator himself – there was a much simpler way of dealing with life: as it came. We’ve already discussed this previously – reinforcement learning.

The “Scorecard” (Model-Free Learning)

In contrast to the Narrator’s sophisticated mental simulations, much of the daily action of organisms, and indeed, much of your daily actions today, particularly habituated actions and simple stimulus-response behaviours, are governed by older, more primitive brain structures (like the basal ganglia and amygdala).

These systems operate largely “model-free”, relying on a vast ‘scorecard’ of learned associations: ‘Situation X -> Action Y -> Reward Z (Good!)’ or ‘Situation A -> Action B -> Penalty C (Bad!).’

When we scroll social media seemingly in a trance, or automatically tie our shoelaces, it’s often these older processes running the show.

This ‘scorecard’ is built directly from real-world trial and error, creating fast, reflexive ‘gut feelings’ or automated responses without the need for conscious deliberation or simulation – this is what we were discussing in the previous section on reinforcement learning.

The Narrator (our conscious, simulating neocortex) is largely disengaged during these times, simply allowing these highly efficient, hard-wired behavioural programs to execute.

The “Rules of the World” (Model-Based learning)

This is where the Narrator comes in, actively filling his notebook by figuring out the rules and dynamics of the environment, allowing him to mentally explore possibilities without actually having the body perform them.

Like a seasoned chef, who after years of trial-and-error (model-free learning), understands ingredients, reactions and heat – thus, can cook a new dish in their head without entering the kitchen. They’ve figured out the “rules“.

The notebook contains entries like: “If I am in Situation P and I take Action Q, then I will typically end up in Situation R”.

This explicit internal model of the environment’s dynamicsthis is the simulation you live in.

The Narrator actively consults and uses this section for mental simulation (VTE), planning, and counterfactual learning. Where it plays out possible future actions.

The “surprise” for our friend, the Critic, now comes not just from actual external rewards, but from these internal simulations predicting a better outcome than previously expected.

The Narrator actively consults and manipulates this section for planning, problem-solving, and adapting to novel situations, enabling counterfactual thinking – learning from “what if” scenarios that only exist in its mind.

This “Narrator’s Notebook”, is what AI researchers call a “World Model” or an “Internal Model“. It’s an AI’s learned representation of how the environment behaves, how actions lead to new states, and sometimes, how those new states translate into rewards. The development of internal world models moves AI closer to the biological paradigm of “learning by imagining” rather than just “learning by doing”. This ability to build a dynamic, predictive understanding of reality and then manipulate it internally for planning and learning is viewed by many as a non-negotiable requirement for achieving AGI.

What “thinking” is

It’s time to think about the three facts we’ve already discussed:

  1. The fundamental unit of the neocortex is general purpose and the same everywhere (neocortical column).
  2. In case of the outside world, it creates a simulation, an internal model (which is what you only ever “see”), and compares it to actual sensory inputs, modifying the simulation you see to match reality if need be.
  3. And this same fundamental unit is also what powers the other important role – thinking.

Same fundamental unit … simulates outside world … is also responsible for thinking.

  • Now, my question to you is this: Due to the neocortex, what you are perceiving is not the actual outside world but a simulation of it – How do you think you are perceiving your own thinking?
  • My second question to you is this: If you perceive your own thinking as a simulation, whose thinking is it then in reality?

The neocortex, is divided into two halves. While the sensory neocortex (at the back) is dedicated to rendering a detailed simulation of the external world, the front half is the frontal neocortex.

This frontal region in humans comprises three main subregions: the motor cortex (involved in planning and executing movements), the granular prefrontal cortex (gPFC), and the agranular prefrontal cortex (aPFC).

For a deeper dive into the intricate architecture and evolutionary journey of these remarkable brain regions, I highly recommend exploring Bennett’s book.

The aPFC’s configuration is quite interesting. You see, just as the sensory neocortex receives input from external senses (eyes, ears, skin) to build its model of the outside world, the aPFC receives its primary input from internal brain structures: hippocampus (our spatial map and memory hub), hypothalamus (regulating internal bodily states like hunger and thirst), and amygdala (processing emotions and valence).

This suggests something remarkable: the aPFC appears to treat inputs of our own internal states, emotions, and past experiences – our “inner world” – the same way the sensory neocortex treats sequences of external sensory information. The aPFC is attempting to explain and predict the animal’s own behaviour, just as the sensory neocortex explains and predicts the flow of external reality. I cannot overstate how ground breaking this discovery has been for neuroscience.

You may think: “It was preposterous enough to suggest that the brain simulates reality and then compares it to sensory input, but here this guy is saying that the brain simulates the animal itself and then compares it with the actual actions taken by the animal. What? What about free will? What about intent?

If this your reaction, then it makes the two of us. This is where it gets truly mind-bending and leads to profound philosophical implications.

man person smartphone internet

Just as the brain predicts what it should see, it also predicts what it should do, or what its intentions are.

The aPFC, by modelling these behavioural sequences and goals, essentially creates an internal “simulation of future intent.”

If your aPFC “predicts” you’ll reach for water (based on your internal state of thirst and learned goals), but your body (via basal ganglia/motor cortex) starts moving towards food, this creates a “prediction error” about your own behaviour/intent.

Just as the sensory cortex is predicting how the outside world should behave, the aPFC is predicting how the animal should behave.

But unlike the sensory cortex that updates the simulation to conform to the world, the aPFC tries to make the animal conform to the simulation.

That is, if there’s a prediction error, the aPFC drives action to resolve it, not just updates its model. It tries to make its own predictions come true.

This is what you may call “will“.

The way it does so, in simple words, is by essentially playing “trailers” of alternative futures for the older brain (basal ganglia) to “see”.

This is nothing but the Vicarious Trail and Error (VTE) I explained above – But for the older brain, imagined futures and experienced futures both result in “votes” for or against the actions that led to those futures (reinforcement learning).

And if the votes for the actions that lead to a certain future cross a threshold, then the older brain takes those actions.

As Bennett explains: “… neocortex simulates sequences of actions, but what makes the final decision … [the] basal ganglia … [enabled by the aPFC the] process of vicarious trial and error unfoldsvotes [accumulate] for each choice in the basal ganglia—the same way it would if the trial and error were not vicarious but real. If the basal ganglia keeps getting more excited by [option ‘A’] than by [‘B’] … then these votespass the choice threshold … [and the] basal ganglia will take over behaviorthe aPFC vicariously trained the basal ganglia …”.

This process: the aPFC playing trailers of different futures for the basal ganglia, and the basal ganglia, “agnostic” to whether the experience is real or simulated, learning from these internal “trailers” and updating its “scorecard” – This is what you and I call thinking.

Bennett (and the predictive processing framework) suggests that “intent” or “will” is not some ethereal, non-physical entity, some “uncaused cause”, but rather a computational construct.

A high-level prediction the aPFC generates about future behaviour that it then tries to make happen

His view doesn’t necessarily eliminate free will, but it redefines it as an emergent property of a complex, predictive, self-organising system.

Our “will” might be the brain’s highest-level prediction of what it’s going to do, which then biases the system to make that prediction a reality.

The aPFC is “generating” it all the time, but the times we are not acting out of this “will” is, like said before, when the habits are ingrained in us. But when the basal ganglia’s habit system doesn’t have a strong, clear answer, when there’s uncertainty or a prediction error, or the stakes require deliberation, or a novel situation arises – aPFC engages.

Note, however, that the aPFC does not “come online” immediately at birth – instead, it comes online slowly and learns about the world from the basal ganglia.

In development, infants start with more reflexive, stimulus-response behaviours (basal ganglia/model-free dominant).

As the prefrontal cortex develops, children gradually gain more goal-directed, flexible, and deliberate control. The basal ganglia “teaches” the aPFC about basic rewards, and then the aPFC learns to predict and pursue more abstract goals, subsequently “teaching” or guiding the basal ganglia’s execution.

girl holding yellow plastic cup full of macaroni

So, initially babies/toddlers have no “will” (in the sense of “goal-driven behaviour”), it only develops with time.

When it comes to the other component of the frontal cortex, the motor cortex, the situation is similar – as Bennett points out “… aPFC learns to predict movements of navigational paths, whereas the motor cortex learns to predict movements of specific body parts. The aPFC will predict that an animal will turn left; the motor cortex will predict [where] the animal will place its left paw …”.

And similar to the aPFC’s workings, when you have the make a novel, complicated move, the motor cortex first helps you imagine that move, which vicariously trains the basal ganglia on the configuration of muscle movement. Over time, through real and vicarious trial and error, you become “a natural” at the move and no longer need (as much) the motor cortex.

So, it seems the neocortex just simulating all the time.

Whether it is simulating the outside environment or it is simulating the organism’s thoughts or it is simulating the organism’s body movement.

Whatever it is doing, in the end it is always just stimulating. It seems.

Metacognition

We’re now underway to seeing the brain evolve to what it is today – there is still some time to go, but we’re getting there.

Bennett talks about the era of mammals that followed the meteorite that struck Earth.

He talks about the subset of mammals that humans belong to, primates, who shifted from nocturnal to diurnal, developed opposable thumbs, became arboreal frugivores, lived in groups and whose brains exploded in size.

He talks about that while mammals all experienced brain size growth, primates experienced the most (besides maybe dolphins and elephants).

He tries to understand why such a large brain was needed, and says that while we think it is about the size of the social groups we live in but there is more to it.

It isn’t group size in general but the specific type of group that early primates created that seemed to have required larger brains.“.

While many mammals decided it was better to live in groups and paid the costs that came along with it (food scarcity, competition, infighting), for the benefits (safety, protecting the young) – but it was the primates that show sophisticated understanding of the “theory of mind“. It is the ability to put yourself in someone else’s shoes and see the situation from their POV. And not just understand their intentions, but also their knowledge. To know that other intelligent beings exist, and who can have different knowledge and intent than you.

He talks about how grooming between monkeys is much more than an attempt at hygiene, that is plays an important role in social structure.

That primates are extremely sensitive to violations of social hierarchy, and “Unlike most other social animals, for primates, it is not only physical power that determines one’s social ranking but also political powereven young children will regularly challenge adults of lower-ranking families, but they won’t challenge adults of higher-ranking families.“.

Basically, which family you belong to determines your swagger in society (true for humans as well).

And not just the family you belong to, but the affiliations you carry as well – a lower ranking primate can have associations (say, due to unique skills) with a higher ranking individual, and thus inherit parts of the latter’s social status.

Bennett says the “Low-ranking individuals with powerful grooming partners get harassed much less, even when the high-ranking ally is out of sight; everyone in the group knows ‘Don’t mess with James unless you want to deal with Keith’ability to forge such allyships is one of the primary determinants of an individual’s rank …”.

Why did primates get so politically savvy – Bennett says that as they became arboreal frugivores, they suddenly found themselves not only with an abundance of food, but also time6. And somehow it was adaptive (or at least, not maladaptive) for evolution to spend these new found resources in developing bigger and more sophisticated brains, instead of developing bigger and more sophisticated bodies.

Enter: You

The growth in primate brain size was not just about the old parts getting bigger – some new parts also emerged during this time.

Specifically, the granular prefrontal cortex (gPFC) – I mentioned this briefly a few pages earlier. And something Bennett calls the “primate sensory cortex” (PSC). These two are deeply interconnected and like the rest of the neocortex, these are also simulators.

The question is, naturally, what are these things simulating?

A hint comes from the fact that the gPFC sits atop the aPFC (which itself sits atop the basal ganglia and other structures).

If the basal ganglia enables reinforcement learning, and the aPFC simulates this reinforcement learning (VTE as we discussed), hence, avoiding expensive real world reinforcement. Then the gPFC is likely to be simulating the aPFC’s simulation.

You read that right, it’s a simulation on top of another simulation.

What was the aPFC’s simulation? Build models (of the animal and its world) and manipulate them to imagine different futures.

Basically, Think.

So, when the gPFC simulates the aPFC’s simulations, it is simulating thinking itselfmetacognition – it is thinking about thinking.

Effectively, a meta-simulation hierarchy:

  • Basal Ganglia: Model-free RL (direct action-reward associations).
  • aPFC: This region constructs a “model of intent”. It is essentially predicting “what the animal wants (goals) and knows (about its actions) and thinks (about its immediate behavioural plan).” This is the first level of internal simulation about oneself as an agent. It generates your “will” or “intent” to get water.
  • gPFC: This region then takes the outputs or predictions of the aPFC as its inputs. It’s not looking directly at sensory data or basic drives. It’s looking at the aPFC’s simulations of intent and behaviour.
    • The gPFC then “constructs explanations of” these aPFC-generated intentions. It essentially forms a meta-model – a model of the aPFC’s model of intent.
    • This allows the gPFC to reflect on questions like:
      • “Why do I want that?” (reflecting on one’s own motivations)
      • “Do I really know that?” (reflecting on one’s own knowledge)
      • “Is this plan coherent with my other beliefs or goals?” (integrating different intentions)
      • “What is my state of mind?”

Bennett’s quote here is revealing: “Just as aPFC constructs explanations of amygdala and hippocampus activity (invents ‘intent’), perhaps the gPFC constructs explanations of the aPFC’s model of intent—possibly inventing what one might call a mindgPFC constructs explanations of the simulation itself, of what the animal wants and knows and thinks. Psychologists and philosophers call this metacognition …”

In essence, the gPFC is simulating the very mental processes of the aPFC, creating an internal, reflective model of itself as a thinking, wanting, knowing entity. Simulating abstract rules and concepts, rather than concrete actions or immediate goals. And, if you allow me to indulge myself for a moment, this is the level at which I operate here at Sunchaser.

This is the ultimate “simulation of self” that leads to metacognition and our rich subjective experience of having a “mind”.

back view of a teen boy with a digital background

While the movie “The Matrix” was about an external party putting all of us in a simulation, it appears from this that all of us are living in a sort of “Matrix” after all.

If external reality is a simulation, and our internal experience of intent, action, and self is also a simulation.

Then we live in a self-generated, self-referential simulation.

The Neocortical column is a general purpose simulation generating unit7, and given enough training data this beautiful piece of evolutionary hardware has the capacity to simulate whatever you want it to – sounds, touch, feelings, vision – and indeed, reality, within and without.

Its “output” is always a simulation, and its “learning” is always about refining that simulation to better predict its inputs (whether those inputs are sensory data, internal states, or the expected consequences of actions).

And you, you live inside this reality. The implications for neuroscience, artificial intelligence, and indeed, philosophy are profound. I have expounded enough on this – but the fact is, that evolution decided it was adaptive to encase “you” in a simulation. Think about that.

Principle: The way that the neocortical column became the building block of advanced intelligence is by being able to simulate various features of the environment as well as the organism itself. Although simulations are computationally and energetically expensive, evolution found this approach adaptive and it has persisted through generations.

Application: You do not perceive reality as it, you perceive a constantly updating simulation. You do not perceive your intent as it is, you perceive a constantly updating simulation of what you think “your” intent is. You do not directly see someone else doing something, you first simulate yourself doing the same thing, realise what you are doing and then extend that to understand what they are doing.

Strategist’s Note: Learn the importance of having a “sandbox” in every important area of life. Get into the habit of always making space for a place where you can simulate various outcomes and your responses to them – a notebook, a computer program, your own mind, anything. If evolution thought sandboxing would be a good idea, and you are still here, still reading this , you can safely assume it is.

Building a complex model of a “mind” (a system that generates intentions, holds beliefs, and processes information) is incredibly challenging.

It’s far more efficient for evolution to develop one general-purpose “mind-modeling” mechanism and then use it repeatedly.

The most accessible “mind” to model is your own, as you have direct, continuous access to its “inputs” and “outputs.”

monochrome photography of a chimpanzee

Enter: Future you

Once your gPFC powered brain simulating your thinking in the present moment – “What am I thinking about right now?” – it is a short step from there to “What will I be thinking about in the future?”.

This ability to plan for future, differing motivational states is closely linked to episodic future thinking (mentally simulating specific future events) and episodic memory (mentally re-experiencing specific past events).

Enter: Others

And once your gPFC powered brain is simulating your own thinking – “Why am I thinking this way?” – it is a short step from there to “Why is he thinking this way?”.

Your own mind provides the best blueprint. If you want to understand why someone is doing something, you can simulate what you would do it.

This ability to simulate others, was advantageous in this new social world that the early primates enrolled themselves in. Having political savvy was as important as having brute force. In previous eras, intelligence grew to navigate the world, but in this new era, intelligence grew to navigate other intelligent beings.

Transferring simulations

Finally we arrive at Bennett’s fifth and final breakthrough: language.

While not unique to humans, language really earns its name with us.

  • While other beings have their own versions of language, Bennett posits that those are instinctual, evolutionarily hardwired. For example, Vervet monkey alarm calls (for eagles, snakes etc.) are highly innate and appear hardwired, with little variation across populations or even individuals raised in isolation.
  • Also, humans readily use declarative labels (stating facts, describing: “That is a bird”) while non-human animal communication is primarily imperative (commands, requests: “Give me food!”).
  • Only humans employ grammar. Human grammar allows for generativity (creating endless new sentences from finite words) and recursive structures (embedding clauses within clauses), enabling the expression of complex ideas, hypotheticals, and temporal relationships.
  • Humans acquire language rapidly, universally (barring severe pathology or deprivation), and often without explicit teaching. Children “invent” grammar, even in pidgin-to-creole transitions. This points to a powerful innate language faculty (Chomsky’s Universal Grammar, Pinker’s Language Instinct). No other animal shows this spontaneous, universal urge to create and use complex linguistic systems.

How language arose

There isn’t a single “language organ” that emerged de novo. Language recruits and repurposes existing brain structures (like the neocortex, basal ganglia, etc.).

Bennett states that the ability to learn language is, at least in part, a consequence of a simpler genetically hard-coded instinct to engage in conversation.

He gives examples of babies engaging in “proto-conversations” (babbling, turn-taking, joint attention, pointing) with parents before they actually learn a language. They are also able to attach descriptive labels to things.

He also posits that humans may have also evolved a unique hardwired instinct to ask questions to inquire about the inner simulations of others.

Asking questions is a fundamental human drive. If our gPFC is simulating others’ minds, actively seeking to clarify or expand that simulation via questions (“What do you think? Why did you do that?”), makes sense. It’s a direct way to reduce prediction error about other minds. This is a plausible and insightful extension of Theory of Mind.

In this way, Bennett says evolution gave humans a curriculum to learn language – because complicated concepts cannot be given in one go to bio-intelligence (or even AI).

He is saying that the hardwired need in human infants to engage in proto-conversations, and our curiosity about people’s inner simulations provides a sort of curriculum for language to develop in later years – “… language emerges in the human brain through a hardwired curriculum …” – just like flying emerges in birds through a hardwired need to jump.

For a much deeper and fascinating deep dive into the “Perfect Storm” that led to the creation language and our particular form of intelligence, and why it is so rare in the animal kingdom, please consider buying the book. It is a sophisticated synthesis by Bennett, “Our language, altruism, cruelty, cooking, monogamy, premature birthing, and irresistible proclivity for gossip are all interwoven into the larger whole that makes up what it means to be human.“.

Continuing, Bennett says that language is a way for us to transfer our simulations to others. And while the same happens when animals communicate with each other (indeed, in absence of this, the other party will not be able to understand you), “… these kinds of transfers are undetailed and inflexible, capable of transferring information only with genetically hard-coded signalsalways few in number and cannot be adjusted or changed …”.

Perhaps “transfer” is not the right word, rather, I will say that language provides us a way to create such conditions in other’s minds that we think will lead to the same or similar simulations as ours.

In this way, humans can learn not only from their own imagined actions (simulating), not only for other’s actual actions (imitation learning), but also from other’s imagined actions.

This allows myths (or as I like to call them: hallucinations) to become “common” – powered by language, an entire society is now able to share in hallucinations like the concept of a “nation“, “money“, or “religion“.

Such societies are able to cooperate in flexible ways as Harari popularised in “Sapiens“.

Bennett points out that while the theory of mind enabled initial politicking and group behaviour, this ability alone could not scale the size of the groups beyond the capacity of brains to keep track of relationships, but with language that enabled common myths, groups could be infinitely large.

In this way, Bennett points out that just like nucleotide structures enable the transmission of genes, and useful genes tend to propagate over generations; language enables the transmission of memes, and useful memes tend to propagate over generations.

In essence, both genes and memes are useful information – or as I said – useful configuration, and this configuration, if adaptive, persists over time.

I cannot cap this off any better than Bennett himself: “The real reason why humans are unique is that we accumulate our shared simulations (ideas, knowledge, concepts, thoughts) across generations.“.

Principle: Memetics is a theory of the evolution of culture based on Darwinian principles with the meme as the unit of culture. The term “meme” was coined by biologist Richard Dawkins in his 1976 book The Selfish Gene. The conveyor of the information being copied is known as the replicator, with the gene functioning as the replicator in biological evolution. Dawkins proposed that the same process drives cultural evolution, and he called this second replicator the “meme,” citing examples such as musical tunes, catchphrases, fashions, and technologies. Like genes, memes are selfish replicators and have causal efficacy; in other words, their properties influence their chances of being copied and passed on. Some succeed because they are valuable or useful to their human hosts while others are more like viruses.

Application: Human biological evolution (better genes) has been relatively slow in the last 100,000 years, while cultural evolution (better memes) has been incredibly rapid, leading to unprecedented technological and societal change.

Strategist’s Note: In this way, are memes can be thought of as the new genes. At the level of genetic evolution, the pace of progress is far too slow for our needs now. So, while genes still provide the fundamental biological hardware, which sets the stage for memes, progress primarily comes from memes now in terms of our species’ rapid advancements.

Imaginary minds

It seems that humans are not very selective in ascribing a mind to others – we readily assume things are way more intelligent than they actually are, even in the face of clear evidence to the contrary – like with AI models like ChatGPT.

As of the time I am writing this, we’re living through times where people are developing more than merely “user – tool” relationships with their AI chatbots [1], [2], [3], [4].

But the way these popular forms of AI (classified under “Generative AI”) learn today is a brute force method (vs. human learning). When the power of computing is added to learning, you get something that seems intelligent.

AI is narrowly “intelligent”, no doubt about it – Current LLMs excel at specific tasks (like language generation) within a defined domain, making them narrowly intelligent. But beyond that, popular forms of AI (esp. LLMs) merely produce an illusion of a mind, which, I believe, is really our own doing.

We trick ourselves into thinking there is someone intelligent on the other side.

german text on pieces of paper

Bennett points out: “Humans don’t learn math the way GPT-3 learns math. Indeed, humans don’t learn language the way GPT-3 learns language.“.

True learning in humans does not happen by rote – To truly understand something is to have a robust internal model of it. Humans learn through simulating.

To me, this difference between what humans are doing and what LLMs are doing is a question of Simulation vs Prediction. Both predict but the latter is mathematically predicting a sequence while the former is playing through a simulation and language is used as a transfer mechanism. To us, language is a means; to AI, it is the end.

Language is a powerful tool, but there is a reason it emerged after the Theory of Mind: Language needs to tread a fine line between exactitude and usefulness.

If the language you use is too precise, like a legal contract, you will spend far too much time on it – speaking/writing – this is not only tedious for you but also for the listener/reader. Language here would cease to be practically useful. That is why we gloss over any “Terms & Conditions” page and hit “I Agree” without reading

But if you use language that is too vague, the person you are trying to communicate with may not understand you or may even misunderstand you. Again, language loses its practical use.

As a communicator, Theory of Mind enables you to peer into the mind of someone else and understand the appropriate level of detail for them. And as the communicated, ToM allows you to understand the intend behind the message. It is this inability of AI (as of today) that leads to apocalypses with Paper-clips.

That is why, I think, Bennett is right when he says “… if we want true humanlike AI systems, theory of mind will undeniably be an essential component of that system.

bent paperclips lying on book cover

And so we arrive at the end of Bennett’s book – its been quite a ride, from basic neural circuitry, to neuromodulators, to reinforcement learning, to simulations, to the nature of you, to the Theory of Mind, to language, labelling, to implications for AI. Thank you for being on the chase with me.

In sequence, these are the five breakthroughs in human intelligence per Bennett: Steering, Reinforcing (Reinforcement Learning), Simulating, “Mentalising” (Theory of Mind), and Language. “Each breakthrough was possible only because of the building blocks that came prior.“.

The Sixth Breakthrough

Non-fiction authors sometimes get philosophical at the end of their books. For Bennett, this means talking about breakthrough #6 and its differences from the rest.

He says that ASI is likely to be the sixth breakthrough, and while it was evolution that had been doing the work all this while, the sixth breakthrough will come from us.

This will be the time “… when intelligence unshackles itself from these biological limitations.“. Whatever we create, it will be influenced by the five breakthroughs. And whatever we create will not just be a test of our intelligence but also of our values.

Bennett says that the “… universe has passed us the baton.“.

But I don’t agree, the universe will continue its unceasing march – whether it is a march towards the Big Chill or the Big Crunch.

Really, it is evolution that has passed us the baton, will we – the children of evolution, intelligent and powerful – be wise enough to carry on its legacy.

High-Signal Quotations


Citation: All text in the following section is cited from – Bennett, Max. A Brief History of Intelligence: Why the Evolution of the Brain Holds the Key to the Future of AI. Kindle Edition.


  • … original purpose of neurons and muscles may have been the simple and inglorious task of swallowing.
  • Affect, despite all its modern color, evolved 550 million years ago in early bilaterians for nothing more than the mundane purpose of steering.
  • Helmholtz suggested that much of human perception is a process of inference—a process of using a generative model to match an inner simulation of the world to the sensory evidence presented. The success of modern generative models gives weight to his idea; these models reveal that something like this can work, at least in principle. It turns out that there is, in fact, an abundance of evidence that the neocortical microcircuit is implementing such a generative model.
  • The reason the neocortex is so powerful is not only that it can match its inner simulation to sensory evidence (Helmholtz’s perception by inference) but, more important, that its simulation can be independently explored. If you have a rich enough inner model of the external world, you can explore that world in your mind and predict the consequences of actions you have never taken.
  • Just as the explanations of sensory information are not real (i.e., you don’t perceive what you see), so intent is not real; rather, it is a computational trick for making predictions about what an animal will do next.
  • How does the aPFC “control” behavior? The idea presented here is that it doesn’t control behavior per se; it tries to convince the basal ganglia of the right choice by vicariously showing it that one choice is better and by filtering what information makes it to the basal ganglia. The aPFC controls behavior not by telling but by showing.
  • Any level of goal, whether high-level or low-level goals, has both a self model in the frontal neocortex and a model-free system in the basal ganglia. The neocortex offers a slower but more flexible system for training, and the basal ganglia offers a faster but less flexible version for well-trained paths and movements.
  • … if we want true humanlike AI systems, theory of mind will undeniably be an essential component of that system.
  • It is possible, perhaps inevitable, that continuing to scale up these language models by providing them with more data will make them even better at answering commonsense and theory-of-mind questions. But without incorporating an inner model of the external world or a model of other minds—without the breakthroughs of simulating and mentalizing—these LLMs will fail to capture something essential about human intelligence. And the more rapid the adoption of LLMs—the more decisions we offload to them—the more important these subtle differences will become.
  • … emergence of language was as monumental an event as the emergence of the first self-replicating DNA molecules.
  • When humans use language with each other, there is an ungodly number of assumptions not to be found in the words themselves … both parties are guessing what is going on in the other’s head.

The Takeaways

First, let me take a moment to acknowledge a remarkable aspect of this book itself: its author.

While Bennett humbly casts himself as merely the thread that runs through pearls of wisdom scientists have provided, the sheer intellectual rigour required to synthesise such a vast, intricate, and often chronologically disparate body of knowledge into a logical, accessible, and internally consistent structure is astounding.

Bennett, does not come from a neuroscience or medical academic background.

Despite this, he has crafted an incredibly compelling and coherent narrative of complex scientific concepts, drawing meticulously from the work of several scientists and Nobel laureates. Additionally, he weaves in concepts from AI, and even in cases, psychology. It truly reads like a piece from someone erudite.

This daunting task, typically expected from authors with extensive academic credentials, speaks volumes about Bennett’s formidable grasp and dedication to the subject. And although it is not the primary focus of my review, Bennett spends a lot of time explaining individual brain structures, how they work, and how their morphology changed from Proto brains to modern human brains.

For a much deeper dive into how the brain evolved from simple wiring to the complex system it is today, please consider buying the book – as I said, Bennett goes into much more detail into the mechanisms of how different brain structures – such as the basal ganglia (reinforcement learning), hypothalamus (actual reward assessment), cortex (pattern recognition), neocortex (advanced intelligence) and much more – actually work.

I thought this would be a book first about AI, and while Bennett does talk about the implications of how our intelligence evolved on how artificial intelligence may evolve – this is mostly a book about biological intelligence. In that way, the sub-title of the book “Why the Evolution of the Brain Holds the Key to the Future of AI” is a little misleading.

This is the thick book with concepts that are even thicker, it takes time to understand the full import of what is being said.

But be patient with the book and it will reward you – Bennett has truly done outstanding work here.

The exposition of how the mind evolved and how it works makes you realise how complicated these daily actions that we take for granted really are.

For instance, when your friend shows up to a test without a pen and you give your extra pen to them; this requires sophisticated intelligence to know that a pen is a necessary tool for this job, that you will need only one pen, that the person who does not have a pen will not be able to finish the job, that you would have felt bad had you not been able to finish the job, that parting with an extra pen does not harm you, that it, in fact, uplifts your social status.

This is enormously intellectually demanding, yet we do feats like this and much more every day.

It is truly humbling to think about evolutionary timescales and the incomprehensible amount of luck humans needed to get here, yet, here we are.

Powerful book, and combined with Harari’s Sapiens and Nexus, makes for a good 101 understanding of humankind.

Your 3-Point Action Plan

  1. Practice Vicarious Trial and Error (VTE). The next time you face a moderately complex decision, consciously stop before acting. For five minutes, mentally simulate at least two different courses of action and their likely outcomes. Choose the path that seems most promising in your simulation.
  2. Apply the “Narrator” Principle to a Disagreement. Think of a recent disagreement you had. For five minutes, try to articulate the other person’s perspective from their point of view. What was their internal simulation or “story” of the situation? This is a practical exercise in Theory of Mind.
  3. Identify a Core “Meme”. Reflect on your own work or a core belief you hold. What is the central idea or “meme” you are trying to replicate in the minds of others? Practice articulating this core meme in a single, clear sentence. This is an exercise in leveraging language to transfer your simulation.

Aviral Prakash

  1. You may think, “How does the organism instinctually know what is good in the first place?”. Simple answer, evolution. Because only those organisms survive to reproduce who correctly (even if they got there by chance) labelled a predator and toxins as bad, and food as good. ↩︎
  2. This is mostly accurate and a great simplification. The key nuance is that it’s predominantly about strengthening (or weakening) existing connections between neurons, rather than forming entirely new physical connections from scratch every time. While new connections can form, the primary mechanism of learning is changing the efficacy of the billions of connections already present. ↩︎
  3. I wrote “memory/learning” because these are deeply intertwined in our brains. They are two sides of the same coin. You can’t have learning without some form of memory to retain what was learned, and memory wouldn’t exist without a prior learning event. They are inextricably linked. ↩︎
  4. In Reinforcement Learning, these two together are known as the “value” of a state (V(s)), and represent the total expected cumulative future reward an agent can anticipate receiving if it starts from that state and then follows a particular optimal policy (strategy) onwards. ↩︎
  5. It is in this way that the three well known properties of perception are generated. First, “Filling in”: When there are gaps in sensory data (e.g., your blind spot, or occluded parts of an object), your narrator “fills them in” based on its best prediction from the surrounding context and prior knowledge. You don’t perceive a gap; you perceive a complete, inferred scene. Second, “Perceiving one thing at a time”: Your narrator constructs a coherent, unified perception, rather than a jumble of raw sensory inputs. It selects and integrates relevant predicted information. And third, “Inertia of the prediction”: Once your narrator has learned a strong, consistent prediction (a “habit” of perception), it takes a very strong, sustained prediction error (contradictory sensory data) to force it to change that prediction. This explains optical illusions that persist even when you know they are illusions. ↩︎
  6. Well, actually, it’s a little more complicated than “fruits enabled big brains”. You see, a fruit-based diet is seen in scientific circles as both a cause and a consequence of primates’ larger brains. Fruits are rich in readily digestible sugars (carbohydrates) and often fats, providing a high-energy, nutrient-dense food source compared to leaves. This excess energy could have provided the necessary “fuel” to support the growth and maintenance of larger, more metabolically demanding brain tissue. But at the same time, finding and exploiting fruit is often more cognitively demanding than simply munching on abundant leaves. Fruit trees are often clumped (not evenly distributed) and ripen seasonally (not always available). A primate needs to remember the locations of many fruit trees across a large home range. To remember when specific trees are ripe, or when a preferred fruit species will be in season again. To navigate efficiently between these dispersed and time-sensitive resources (linking to your earlier discussion of spatial maps). Plus, many fruits require complex processing to eat (e.g., peeling tough skins, breaking open hard nuts, extracting pulp). This requires: problem-solving abilities, manual dexterity/fine motor control (linking to opposable thumbs), tool use (e.g., cracking nuts), discriminating ripe from unripe fruit might require advanced colour vision (many primates have trichromatic vision), smell, and touch. ↩︎
  7. The “neocortical column as a general-purpose simulation generating unit” is a strong theoretical claim by figures like Jeff Hawkins, based on the observed uniformity of cortical microcircuitry. It’s not universally proven for every aspect of its function, but it’s a leading and highly influential hypothesis that makes intuitive sense given the neocortex’s versatility. In that sense, interesting parallels can be drawn between you and AI such as ChatGPT, even though the underlying mechanisms are different. ↩︎

Discover more from Sunchaser

Subscribe to get the latest posts sent to your email.

One response

Leave a Reply

Privacy Policy

Discover more from Sunchaser

Subscribe now to keep reading and get access to the full archive.

Continue reading