Meet HRM: A Brain-Inspire AI that Solves what GPTs Can't

Igor Alcantara
Aug 7
12 min read

Updated: Oct 13

I want you to picture the opening of 2001: A Space Odyssey, Thus Spoke Zarathustra. Stanley Kubrick gives us one of the most iconic scenes in cinema history. One of our ancestors, possibly a Homo Habilis, stands at the dawn of cognition. Surrounded by the bones of the past, they lift one from a carcass and, with the abstract thought that only advanced primates possess, imagine its potential.

Our great-great-g(n)reat-grandparent sees beyond what’s in front of them. They use imagination. Abstraction. They realize they can extend their body with an object. They can project force, make a tool, shift destiny. And with that revelation, they hurl the bone into the sky.

That spinning bone becomes a symbol: the first human tool, born not from instinct, but from thought.

As it tumbles through time, the scene cuts, suddenly, violently, to a spaceship orbiting Earth. A ship run by HAL-9000, a conscious artificial intelligence. Perhaps humanity’s final invention. The advanced AI might take it from here.

Between that bone and that AI lies the entire arc of human evolution. From primal survival to complex reasoning. From tools that extended our bodies... to minds that may one day exceed our own.

A very recent study made me reflect: is this the dawn of an artificial intelligence that can soon think? Our monolith. A mind not limited to mimicking humans through clever autocomplete, but one capable of actual reasoning. Of breaking down complex tasks, navigating uncertainty, making plans, solving puzzles, not by parroting language but by thinking.

For decades, researchers have chased the dream of artificial intelligence that truly reasons, solving problems and making judgments not just by memorizing what we’ve done, but by thinking through its own internal processes, just as the human brain does. The brain, after all, is arguably evolution’s finest achievement, with a complexity and adaptability that no computer has yet matched.

Too much? Possibly. But not as far off as it seemed last year.

A new paper, “Hierarchical Reasoning Model” (HRM), takes us closer to that dream. But what exactly does it achieve, and how does it work? Let’s embark on that journey, past toolmakers and spaceships, and into the architecture of a machine that reasons almost like we do.

And yes, this one might just be a bone worth watching, pun absolutely intended.

The Problem with Today's AI: Smart Parrots, Shallow Thinkers

Let’s get something out of the way: today’s best AI models, including those with billions of parameters, are glorified mimics. They might sound reasonable or even "human" but they're more like parrots. Really smart parrots.

They complete text, predict the next word, and simulate reasoning by writing it out one token at a time. If you want to know how they work, read my article about the mechanism of attention. That’s “Chain-of-Thought” (CoT) prompting, getting the model to “show its work” like a student in math class. It works to a point. It is great with language and images, but scratch the surface, give really complex tasks, and things fall apart.

One wrong intermediate step? The whole reasoning fails.
Want to solve a 9x9 Sudoku or a 30x30 maze? Forget it.
Try generalizing to an entirely new puzzle? Good luck.

These models are shallow. Literally. Transformers, the core of LLMs, operate in fixed layers of computation, depth-limited by design (where do you think Deep Learning gets its name from?). They aren’t Turing-complete. They can’t perform loops or recursive reasoning unless you simulate it through language, which is both inefficient and unreliable for some really advanced problems. It is not therefore their fault. LLMs are Language Models, they work great on that realm. You need to keep in mind that language is a tool for human communication, not the core of thought or reasoning itself.

CoT reasoning is ultimately a hack. It’s fragile (one bad step ruins the answer), slow (it outputs mountains of text), and extremely data-hungry (it needs endless examples to learn). Worse, it doesn’t really mirror how the human brain solves problems: in silence, through “latent reasoning” that happens inside before any words are spoken.

So what’s the alternative? The authors of HRM believe the answer lies in direct inspiration from neuroscience: build an architecture that doesn’t just process information like the brain, but that “thinks” in a similar, deeply layered, temporally separated, recurrent way. HRM does not want or need to be bigger, but to be smarter.

A Brain-Inspired Architecture That Thinks in Layers

The Hierarchical Reasoning Model doesn’t try to predict the next word. Instead, it tries to reason like a brain. According to modern neuroscience (check some references here, here, and here), our brains solve problems through three key principles:

Hierarchical Processing: Higher-order regions of the cortex create abstract plans, while lower-order regions handle specific details.
Temporal Separation: Different brain areas operate at different speeds; some fast and reactive, others slow and deliberate.
Recurrent Connectivity: Feedback loops allow us to revise, refine, and converge on better answers over time. We learn from experience.

HRM borrows directly from this structure, and that’s what makes it so different. It introduces a two-module recurrent architecture:

High-Level Module (H) Responsible for “slow, abstract planning.” It’s like your prefrontal cortex mulling over the strategy for winning a chess game.
Low-Level Module (L) Handles “rapid, detailed computations” within each high-level step. Imagine your hand’s fine adjustments moving each chess piece.

These modules operate at different timescales, just like in your cortex. The L-module runs multiple steps rapidly, crunching details. Then, the H-module wakes up, evaluates the situation, adjusts the plan, and sends the L-module off again.

This process loops like nested gears. The low-level spins fast, the high-level turns slowly, each influencing the other. The result? A dynamic, multi-phase reasoning loop that solves problems in layers, just like us.

HRM is designed to mimic the representation levels in our brain

How It Works:

Input Transformation: The input (like a Sudoku or maze) is embedded into a working internal representation.
Nested Cycles: The L-module performs multiple fast steps, gradually updating its own state while the H-module sits still (holding to the ongoing plan).
Hierarchical Convergence: When the L-module finishes a set of cycles (reaching a local “equilibrium”), only then does the H-module take a step, incorporating everything learned in that lower-level flurry.
Reset and Repeat: With each high-level update, the L-module is “reset” into a new context, and the cycle repeats, deepening the reasoning chain with every round.
Final Output: After a chosen number of cycles, the model converts its internal state to a prediction, no need for explicit text explanations or “talking its way through” the answer.

Technical Innovations

What makes HRM so compelling isn’t just that it performs well, it’s how it learns to reason. Unlike traditional AI models that brute-force their way through training with massive compute and memory, HRM introduces a series of technical innovations that feel surprisingly... biological. From skipping the memory-hungry backpropagation methods most recurrent networks rely on, to using a more brain-like approach to learning and decision-making, HRM seems less like a silicon brute and more like a thinking system. These breakthroughs: clever, elegant, and efficient, aren’t just engineering tricks. They’re architectural statements about how intelligence might emerge from structure, not just scale.

Let’s break them down.

One-Step Gradient Approximation: Traditional recurrent neural networks use a costly and biologically implausible method called “backpropagation through time” (BPTT), which requires remembering every hidden state step. HRM skips this, using a mathematical trick from equilibrium models (like those theorized for the brain) to approximate learning with negligible extra memory.
Deep Supervision: To train HRM efficiently, the model uses feedback after each “segment,” detaching previous states to stabilize learning. This ensures that learning remains local, another handshake with biology.
Adaptive Computation Time (ACT): Borrowing from psychology’s classic “Thinking, Fast and Slow,” HRM can dynamically decide how long to think (how many cycles to take) on each problem, spending more effort on hard tasks.

I will detail more these concepts soon, but do you know what is also incredible? This model has just 27 million parameters, no pretraining, no human CoT supervision, and only 1000 training examples per task. Yes, you read that right. Only one thousand sample size. 27 million seems like a lot but they are hundreds or thousands of times smaller than the parameters needed for the latest models of ChatGPT, Claude or Gemini.

Let’s see what it can do.

The Tests: Not Your Average Benchmark

To prove what it's capable, HRM was tested on three complex reasoning tasks. These aren’t your average “fill-in-the-blank” LLM tasks. These are serious tests of logic, planning, abstraction, and generalization. They are notoriously difficult reasoning benchmarks. Some tasks are just too difficult for a GPT to perform. Remember my ChatGPT Find Waldo article? So, let's get to it.

1. Sudoku-Extreme

Forget the “easy” Sudokus computers and retired people love. HRM was tested on the “Sudoku-Extreme” set, made up of puzzles so difficult that the best vanilla transformers (even with 175 million parameters and a million training examples) barely solve any. These puzzles are not cherry-picked. They’re hard. The dataset was built from:

Kaggle’s standard Sudoku set
The infamous 17-clue dataset (the minimum number of clues for a unique solution)
Hand-crafted, human-recognized “extreme” Sudoku puzzles

The average difficulty? 22 backtracks per puzzle (versus 0.45 in other research datasets).

2. Maze-Hard

Find the optimal path in a 30x30 maze. Sounds simple? Not really.

This benchmark is GPU-unfriendly, highly branching, and rewards algorithmic planning, not just vision or memorization. Difficulty is based on the length of the shortest path.

State-of-the-art LLMs have near 0% accuracy on these puzzles.

3. ARC-AGI (Abstraction and Reasoning Corpus)

This is the holy grail for testing “general intelligence”: puzzles that require the machine to deduce abstract rules and then apply them to new situations. Think of this as an IQ test for machines. It consists of small image grid puzzles where the AI must infer a rule from a few examples, then apply it to a new input.

This task requires inductive reasoning and symbolic abstraction. You can’t memorize answers. You have to learn how to think.

HRM tackled two versions:

ARC-AGI-1: The classic set
ARC-AGI-2: A newer, more difficult version emphasizing multi-step logic and generalization

Left: Visualization of benchmark tasks. Right: Difficulty of Sudoku-Extreme examples. — **Left**: Visualization of benchmark tasks. **Right**: Difficulty of Sudoku-Extreme examples.

The Results: When Less Is More

Let’s put this in perspective. HRM was trained from scratch, without massive corpora, without CoT tricks, and without pretraining. Just 1000 examples per task.

Now look at this:

Task	HRM Accuracy	Claude 3.7	o3-mini-high	DeepSeek R1
ARC-AGI-1	40.3%	21.2%	34.5%	15.8%
ARC-AGI-2	5.0%	1.3%	3.0%	0.9%
Sudoku-Extreme	74.5%	0%	0%	0%
Maze-Hard	100%	0%	0%	0%

These are stunning numbers. Particularly when the competition includes models with:

100x more parameters
1000x more training data
Pretraining on internet-scale corpora
Chain-of-Thought guidance

And still, HRM wins. Why? Because it doesn’t just mimic reasoning. It performs it.

The authors didn’t just evaluate outcomes, they also “peeked inside” the process. They studied the trajectories of the H and L modules’ hidden states as the model solved different tasks.

What did they find?

Maze Solving: HRM first explores multiple possible routes through the maze, then systematically eliminates dead ends, refining the solution in stages.
Sudoku: The model shows iterative reasoning reminiscent of human backtracking: exploring, correcting, even undoing choices in pursuit of a valid grid.
ARC Puzzles: HRM takes incremental, hill-climbing-like steps, adapting its approach per problem class.

Most compellingly, across tasks, the model adapts its problem-solving approach, rather than blindly applying the same algorithm.

How HRM solved Sudoku-Extreme, step-by-step

Under the Hood: How HRM Actually Works

Let’s get technical for a moment.

The HRM consists of four key components:

Input network (fI): Embeds the input
Low-level module (fL): Fast, detailed recurrent computations
High-level module (fH): Slow, strategic updates
Output network (fO): Decodes the final prediction

The computation unfolds in N cycles of T steps each:

For every cycle, the L-module runs T steps with a fixed high-level state.
After T steps, the H-module updates based on the L-module’s final state.
Then, the L-module resets, and the process repeats.

This loop allows NT steps of effective computation, far deeper than typical Transformers.

Hierarchical Convergence

Some of the most advanced AI algorithms are the Recurring Neural Network (RNN), a type of neural network architecture designed to process sequences of data by maintaining a hidden state that captures information from previous time steps. This structure allows RNNs to model temporal dependencies and patterns across sequences, making them suitable for tasks like language modeling and time series analysis.

A problem with RNNs is that they converge too quickly. They “settle” and stop computing anything meaningful. HRM solves this with hierarchical convergence:

The L-module converges locally per cycle.
The H-module nudges the process into a new region of solution space.
This restarts the L-module’s thinking in a new context.

The result is a sequence of stable yet evolving computational phases, like zooming in and out of a problem.

No Backprop Through Time (BPTT)? No Problem.

Backpropagation Through Time (BPTT) is a way to train recurrent neural networks by unfolding them over each time step, so they look like a long chain of connected layers. The algorithm then works backward along this chain, adjusting the network’s weights based on how each step affected the final result, but this method can use a lot of memory since it has to remember everything that happened at every step.

BPTT is expensive. It consumes memory like a black hole and is biologically implausible. It works in a way that brains don’t: BPTT relies on global information for learning, while the brain learns using only local changes.

Instead of relying on BPTT, the Hierarchical Reasoning Model (HRM) uses a much simpler approach to training. It looks only at the final state of each of its internal modules to figure out how to update itself. Think of this as taking a quick snapshot at the end, rather than recording the entire journey. This method, called a one-step gradient approximation, which I explained before, comes from ideas in Deep Equilibrium Models (DEQ). It means HRM can learn using a tiny, constant amount of memory, no matter how many steps it runs, which makes training both efficient and more practical.

Adaptive Compute Time

Inspired by Daniel Kahneman’s Thinking, Fast and Slow, HRM also learns when to stop thinking. Using a deep Q-learning policy, HRM dynamically decides whether to continue reasoning or halt. Each segment is like a thought. Too simple? Stop early. Too complex? Keep going. This mechanism saves compute and mimics how the brain modulates cognitive effort based on task difficulty.

Does HRM “Think” Like a Brain?

Now here’s where things get exciting.

Researchers compared HRM’s internal states with neural principles from actual mammalian brains, using a measure called Participation Ratio (PR), a proxy for the dimensionality of thoughts.

The results?

The high-level module’s state (zH) had a 3x higher dimensionality than the low-level module (zL).
This mirrors the human brain, where higher cognitive areas operate in higher-dimensional spaces.
Importantly, this hierarchy emerged through training, not architecture.

In other words, HRM developed a brain-like structure to solve problems, not because we forced it, but because it worked. It's Evolution, babe!

(a) Anatomical illustration of mouse cortical areas, color-coded by functional modules. (b) Correlation between Participation Ratio (PR), and hierarchical position across different mouse cortical areas. Higher positions in the hierarchy exhibit significantly higher PR values compared to lower sensory areas.

Unlike many deep networks, it doesn’t suffer from “neural collapse,” where internal states compress into a low-dimensional blob. Instead, HRM maintains rich, varied representations, a critical feature for reasoning flexibility.

The Implications: More Than Just Another Model

The results of HRM are more than just “benchmarks beaten”, they represent the emergence of a fundamentally new approach to reasoning in machines. So, what does all this mean?

Smarter with less: HRM achieves reasoning with fewer parameters and less data.
Closer to how we think: It organizes reasoning hierarchically, recurrently, adaptively, like the brain.
Beyond token-chaining: It shows that AI doesn’t have to simulate logic through language to reason.
Emergent structure: Its internal organization isn't programmed, it's learned.

This suggests a future for AI where progress comes not only from scaling up model size, but from taking cues directly from the strategies, hierarchy, temporal separation, recurring feedback, refined by millions of years of evolution.

So… Are We Getting Closer to True Intelligence?

HRM isn’t a finished product. But it’s a glimpse into a new paradigm, one where reasoning isn’t bolted on to language models, but baked into the architecture from the start.

Let’s be honest: we’re still far from AGI, that kind of Super AI that is better than humans in everything. This is brand-new research, publish just 10 days ago, that needs further development and confirmation in much more use cases. But HRM makes a compelling argument that we’ve been asking the wrong questions.

It’s not just about how much data you feed the model. It’s about how the model processes it.

The human brain isn’t the most powerful computer in terms of raw FLOPs. But it’s structured in a way that allows for compositionality, abstraction, and dynamic reasoning. HRM doesn’t replicate this perfectly, but it’s heading in that direction.

Maybe the next evolutionary leap in tools isn’t more bones or bigger spaceships. Maybe it’s this. Just as Kubrick’s bone became HAL-9000, the advanced AI may become a brain, an artificial one, made by and for us, yet capable of tracing its own, recursive, uniquely powerful “thoughts.”

We will be longer just teaching machines to echo human answers; we’re starting to build systems that can think. That can plan, adapt, and solve problems in ways that feel less like imitation and more like cognition. Tools are becoming thinkers. Algorithms are inching toward understanding. And perhaps, just perhaps, these artificial reasoners will not only follow in our footsteps but chart paths we never imagined.

Welcome to the dawn of artificial cognition, a new chapter in the human story, maybe the last. A machine odyssey begins.