Beyond Transformers: RockAI’s Alternative Path to AGI

In a world obsessed with ever-larger cloud models, a Shanghai startup is betting on smaller, smarter AI that learns and remembers like the human brain

Image 2

At the bustling 2025 World Artificial Intelligence Conference (WAIC) in Shanghai, amid thousands of attendees marveling at the latest cloud-based AI demonstrations, something extraordinary was happening at a modest booth that most visitors might have overlooked. A robotic dog, completely offline and disconnected from the internet, was learning new tricks in real-time from a middle-aged visitor.

The man taught it a simple sequence: turn in a circle, then sit up and perform the classic “dog begging” pose. Within two minutes, the robotic dog had perfectly replicated the entire routine — without any pre-programmed instructions, remote control, or internet connection. Nearby, robotic hands played video games with surprising skill, navigating through Tetris-like puzzles and mining games with strategic precision, all powered by local AI models.

The booth’s representative told visitors that this represents “device-native intelligence: capable of offline operation, multimodal processing, and learning while being used.”

This wasn’t just another tech demo. It was a glimpse into a fundamentally different vision of artificial intelligence — one that challenges the very foundations of how we think about AI development, deployment, and the path to Artificial General Intelligence (AGI).

Connected Intelligence in a Disconnected World

Despite years of promises about AI-powered devices, we’re living in a technological paradox. Companies enthusiastically market “AI phones,” “AI glasses,” and “AI toys,” yet almost every meaningful AI interaction still requires a stable internet connection. The most sophisticated AI experiences remain tethered to the cloud, leaving users frustrated when networks fail and concerned about privacy when they don’t.

“Everyone talks about offline intelligence and on-device AI,” explains Zou Jiasi, co-founder of RockAI, the company behind the learning robotic dog, in a recent interview with Geek Park. “But between ideal and reality lie two nearly insurmountable mountains: computational power and energy consumption.”

This disconnect isn’t merely an engineering challenge — it represents a fundamental mismatch between current AI architectures and the physical realities of edge devices. While data centers enjoy virtually unlimited computational resources, mobile devices face harsh constraints: limited processing power, restrictive energy budgets, and the unforgiving physics of heat dissipation.

The statistics are sobering: most smartphones begin overheating after just minutes of running large models locally. AI glasses and toys rely on chips designed for basic connectivity, not complex reasoning. Even high-end devices struggle with the computational demands of Transformer-based models, leaving the vast majority of existing hardware unable to meaningfully participate in the AI revolution.

The Transformer Trap

To understand RockAI’s radical approach, we must first examine why the Transformer architecture — despite its spectacular success in cloud environments — creates fundamental problems for edge deployment.

The Transformer’s revolutionary innovation was the attention mechanism, which allows models to consider relationships between all parts of an input simultaneously. Imagine a traditional AI model as a factory worker processing information sequentially, with limited memory about previous steps. A Transformer operates like a master conductor who can see the entire orchestra at once, understanding how every musician relates to every other musician in real-time.

This “global handshake” approach — where every token must interact with every other token — creates extraordinary understanding capabilities. But it also creates an exponential computational burden that scales quadratically with input length. Each additional word doesn’t just add to the processing load; it multiplies it dramatically.

“Transformer models are fundamentally designed for cloud environments where you have unlimited computational resources,” explains Yang Hua, RockAI’s CTO, in an interview with Qbitai. “When you try to force this architecture onto edge devices, it’s like asking a Formula 1 race car to navigate narrow mountain roads — the basic design assumptions completely break down.”

Mobile processors are designed more like efficient assembly lines, excelling at sequential, high-speed processing. When you ask them to perform the parallel, relationship-heavy computations that Transformers demand, they quickly become overwhelmed. The result is exactly what we see today: AI devices that overheat, drain batteries rapidly, or simply cannot run sophisticated models at all.

A Different Path

As Liu Fanping, CEO of RockAI, stated publicly during WAIC 2025, “Currently, AI development needs to overcome two mountains: one is backpropagation, and the other is Transformer.”

Rather than attempting to optimize Transformers for edge deployment — the approach taken by virtually every other company — RockAI made a more radical choice in early 2022, even before ChatGPT sparked the current AI revolution: they would rebuild the engine from scratch.

“We’re not trying to modify an F1 race car to drive on mountain roads,” Zou explains in the Geek Park interview. “We’re designing an entirely new off-road vehicle that can race through those mountains naturally.”

The Yan architecture abandons the attention mechanism entirely, replacing it with what RockAI calls a “feature-suppression-activation” system combined with compartmentalized activation. Instead of requiring all parameters to be active for every query, Yan models activate only the specific neural “zones” relevant to each task — much like how the human brain doesn’t fully illuminate when processing simple requests.

This bio-inspired approach yields dramatic efficiency gains: seven times faster training, five times higher inference throughput, three times better memory capacity, and significantly reduced power consumption compared to equivalent Transformer models. But the real breakthrough isn’t just efficiency — it’s a completely new capability that current AI systems lack entirely.

Memorizing and Learning on Edge Devices

Perhaps Yan 2.0’s most revolutionary feature is something that sounds mundane but represents a fundamental paradigm shift: true memory and autonomous learning on edge devices.

Current AI models, even when deployed locally, are essentially read-only systems. Once trained and deployed, they cannot modify their core neural networks to learn new information. Any “personalization” happens through external prompting, context windows, or retrieval systems — the model itself remains static, like a library book that can be read but never updated.

Yan 2.0 changes this through what RockAI calls “training-inference synchronization” — the ability to learn and update while actively being used. Unlike Transformers, which require massive GPU clusters for any learning update, Yan’s compartmentalized architecture enables localized learning through low-power backpropagation.

The technical implementation involves two key phases:

Memory Update Phase: The model determines which old knowledge can be forgotten, then extracts valuable information from current tasks and writes it directly into the memory module. This process doesn’t rely on external caches or databases, but uses a specialized neural network to simulate memory behavior, enabling dynamic erasure and incremental writing.

Memory Retrieval Phase: Yan 2.0 implements a memory sparse mechanism, selecting top-K activated memories from multiple memory slots, fusing them with long-term shared memory to generate new outputs. This allows the model not just to remember, but to “reason with its memories.”

“The model can use (inference) while learning (training) simultaneously,” Zou describes in his interview with Geek Park, “directly writing newly learned preferences into the model’s neural network itself.”

This creates something unprecedented in edge AI: models that grow and evolve with their users, becoming more personalized over time without ever connecting to the internet. The robotic dog learning new tricks wasn’t following a pre-programmed routine — it was genuinely updating its neural pathways based on human demonstration.

Compression vs. Growth

RockAI’s vision challenges the current AI development paradigm in a fundamental way. Most large language models follow what the company calls “compression intelligence” — like giant sponges that absorb vast amounts of internet data during training, then serve as static repositories of compressed knowledge.

This approach faces obvious limitations when applied to resource-constrained edge devices. Text compresses better than images or video, which explains why most small-parameter Transformer models struggle with multimodal tasks. The computational overhead of the attention mechanism makes it nearly impossible to achieve sophisticated multimodal understanding with billions rather than hundreds of billions of parameters.

But RockAI argues for a different scaling law entirely: “compression intelligence + autonomous learning.” Rather than building bigger sponges, they’re creating smaller brains that can grow.

“Real intelligence shouldn’t just be compression,” argues Liu Fanping, RockAI’s CEO, speaking at WAIC 2025. “It should be growth and learning. Human brains don’t start with all knowledge pre-installed — they develop through interaction with the environment.”

Through compartmentalized activation, Yan models can theoretically scale to hundreds of billions of parameters while maintaining low power consumption by activating only the relevant 3% for any given task. This architectural approach suggests a different scaling law: instead of pre-training ever-larger models, you deploy smaller models that grow through real-world interaction.

More importantly, this approach enables something impossible with current architectures: true multimodal understanding at small scale. Yan 2.0 Preview, with just 3 billion parameters, can handle text, images, and audio simultaneously, running on a Raspberry Pi at 5 tokens per second — a feat that would be impossible for any Transformer-based model of similar size.

The Computational Divide

The limitations of current approaches become clear when we examine the physics of edge deployment. RockAI’s experience with hardware manufacturers reveals the brutal realities that theoretical discussions often ignore.

“One of our clients wanted to deploy AI capabilities on smartphones,” Zou recounts in the Geek Park interview, “but every other AI company demanded the latest flagship Qualcomm chips with 16GB or more memory. The reality is that most smart devices can’t support such high-end computational hardware.”

This creates what RockAI calls the “computational divide” — no matter how advanced your AI technology, if it only works on the most expensive devices, it fails to achieve the democratization that true artificial intelligence requires.

The power consumption challenge is equally severe. Smartphone manufacturers consistently report that attempting to deploy large models causes serious overheating — a universal problem with Transformer-based architectures. Several major mobile manufacturers have privately shared this pain point with RockAI, expressing frustration that their AI phone ambitions are blocked by fundamental energy constraints.

The human brain analogy is instructive here. With approximately 80–90 billion neurons, the brain operates as what would be equivalent to an 80–90 billion parameter model. If the brain activated all neurons simultaneously, it would require 3,000–4,000 watts of power. Instead, it consumes less than 30 watts through selective activation of relevant neural regions.

“This is exactly what our compartmentalized activation achieves,” Yang Hua explains in the Qbitai interview. “Instead of lighting up the entire model, we activate only the specific regions needed for each task, achieving brain-like efficiency.”

Hidden Demand for Offline Intelligence

While the technology world obsesses over cloud-based AI capabilities, RockAI has discovered something surprising: significant market demand for truly offline intelligence already exists, particularly in global markets where three factors create compelling business drivers.

Privacy Imperatives: In Europe and North America, data privacy isn’t merely a user preference — it’s often a legal requirement embedded in regulations like GDPR. For manufacturers of toys, educational devices, or personal electronics, keeping user data local isn’t optional. RockAI is currently negotiating with a major toy IP company whose primary requirement is that no user privacy data ever reaches the cloud.

Network Unreliability: Outside major urban centers, reliable high-speed internet remains inconsistent. For manufacturers selling globally, dependence on cloud connectivity severely limits market reach. RockAI’s customers frequently serve users in African wilderness areas, Southeast Asian islands, and other regions where network availability cannot be guaranteed.

Economic Efficiency: At scale, local processing often proves more economical than per-query cloud API costs. For devices that might make thousands of AI requests daily, the arithmetic becomes compelling — hardware costs are paid once, while cloud costs accumulate indefinitely.

RockAI’s current customer base reflects these realities. They’ve secured production orders for AI PCs and tablets that will ship to overseas markets in the second half of 2025, enabling manufacturers to retrofit AI capabilities onto existing hardware through over-the-air updates. Robotics companies require real-time response without network latency. Drone manufacturers need reliable AI in environments where connectivity is impossible.

Gaming as Intelligence Testing

The gaming demonstrations at WAIC 2025 weren’t mere spectacle — they represented sophisticated cognitive benchmarks. Games require rapid decision-making, strategic planning, visual processing, and real-time adaptation to changing conditions. Successfully demonstrating these capabilities running entirely on local hardware proved that Yan’s architecture could handle the kind of complex, real-time reasoning that AGI applications will eventually require.

The robotic hands playing Tetris-style games and mining simulations had to simultaneously:

  • Process visual input to understand game state
  • Plan optimal moves several steps ahead
  • Execute precise motor controls
  • Adapt strategies based on changing game conditions
  • Learn from failed attempts to improve performance

All of this occurred with just 3 billion parameters running locally on modest hardware — a feat that would require orders of magnitude more computational resources using traditional Transformer architectures.

The learning robotic dog represented an even more significant milestone. Real-time motor skill acquisition based on human demonstration requires:

  • Multimodal sensory processing (visual, proprioceptive)
  • Motion pattern recognition and encoding
  • Motor planning and execution
  • Neural pathway updating for retention
  • Behavioral reproduction with appropriate timing

The entire pipeline operated completely offline. New knowledge was directly encoded into the model’s neural weights rather than stored in external databases. This represents a fundamentally different approach to artificial intelligence.

Beyond Transformers

RockAI’s approach arrives at a pivotal moment in AI development. The industry is beginning to question whether Transformer scaling represents the only path forward, with even Google — the birthplace of the Transformer — recently introducing alternative architectures like Mixture-of-Recursions (MoR), which halves memory requirements while doubling inference speed.

The skepticism extends to the very architects of modern AI. Yann LeCun, Meta’s Chief AI Scientist and Turing Award co-winner, has made blunt statements about current limitations, arguing that auto-regressive LLMs are “exponentially diverging diffusion processes” where sequential token generation creates cascading errors through long sequences.

“The industry is collectively asking whether Transformer architecture has reached a bifurcation point,” observes Yang Hua in the Qbitai interview. “The emergence of various hybrid architectures reflects a subconscious industry response that the current approach isn’t sufficient anymore.”

This shift reflects deeper tensions in AI development. While Transformers continue advancing on cloud-based benchmarks, their fundamental design assumptions make them poorly suited for the edge computing scenarios that represent AI’s ultimate deployment target. Every smartphone, smart car, robotics system, and IoT device represents a potential AI endpoint that current architectures cannot effectively serve.

RockAI positions itself not as opponents of Transformer technology, but as pioneers of complementary approaches optimized for different contexts. “Transformers excel in cloud environments with unlimited resources,” Liu Fanping explains in a statement during WAIC 2025. “But AI must eventually run everywhere, and that requires fundamentally different architectural approaches.”

The Vision: Collective Intelligence and Distributed AGI

Beyond immediate commercial applications, RockAI harbors a more ambitious vision: collective intelligence as a pathway to AGI. Rather than pursuing ever-larger centralized models, they envision networks of specialized edge devices that learn, collaborate, and evolve together.

“Human society demonstrates the power of collective intelligence,” Liu Fanping explains during his WAIC 2025 presentation. “Individuals develop specialized expertise, and collaboration amplifies capabilities. We believe intelligent devices should follow the same pattern.”

This vision imagines AI models sharing learned capabilities through neural pathway migration or task ability synchronization, creating organized, specialized, feedback-driven model communities. Instead of a single superintelligent system, the future might feature countless device “brains” interconnected and co-evolving.

Such distributed intelligence offers several advantages over centralized approaches:

  • Resilience: No single point of failure
  • Privacy: Sensitive data never leaves local devices
  • Specialization: Different models optimized for specific tasks
  • Scalability: Growing capabilities through network effects
  • Accessibility: AI available regardless of connectivity

Challenges and the Contrarian Bet

RockAI faces formidable challenges. Technology giants possess vast resources to potentially solve Transformer optimization through hardware acceleration or novel chip designs. Moore’s Law continues advancing, potentially enabling mobile processors to run larger models efficiently within years.

However, RockAI’s leadership believes their architectural advantages will persist and amplify as hardware improves. More powerful chips would enable larger Yan models with even more sophisticated capabilities, while energy efficiency and autonomous learning features would remain advantageous regardless of computational improvements.

“We’re not just building a better edge AI solution,” Yang Hua clarifies in the Qbitai interview. “We’re developing the foundational architecture for how AI will eventually integrate into every aspect of human life.”

The company’s “contrarian bet” reflects deeper convictions about AI’s future. While most pursue AGI through scaling existing approaches, RockAI demonstrates that architectural diversity remains not just viable but essential. Their commercial success suggests multiple pathways to widespread AI adoption may be necessary, each optimized for different contexts and constraints.

The Road Less Taken

Building non-Transformer architectures requires more than technical innovation — it demands reconstructing entire AI ecosystems. Current tools, libraries, training frameworks, and hardware optimizations assume Transformer architectures. RockAI must essentially rebuild the software stack from foundations up.

“This road is difficult and lonely,” admits Zou Jiasi in his candid Geek Park interview. “You’re working against an entire industry’s technical inertia, rebuilding tool chains, communities, and overcoming cognitive costs associated with new architectures.”

The company’s persistence reflects what they call their “obsession gene” — an unwavering belief that models must run on edge devices to achieve AI’s true potential. This conviction sustained them through two years of quiet development while the industry celebrated cloud-based breakthroughs.

“Our moat isn’t any specific technical feature,” Liu Fanping reflects in the Geek Park interview, “because smart people and teams are abundant. Our moat is the accumulated knowledge from navigating uncharted territory, and our distinctive innovation gene that’s been optimized for edge intelligence from day one.”

A Different Kind of Intelligence

As AI development accelerates at breakneck pace, RockAI’s approach offers something increasingly rare: a fundamentally different perspective on what intelligence means and how it should develop. Rather than pursuing human-level performance through massive parameter counts, they’re exploring how AI systems might grow and adapt more like biological intelligence — through experience, memory, and gradual development.

The implications extend beyond technical specifications. If AI systems can genuinely learn and remember through interaction rather than requiring periodic retraining on massive datasets, the entire economic model of AI development shifts. Instead of centralized training requiring enormous computational resources, intelligence could develop gradually, locally, and personally.

This vision resonates with growing concerns about AI centralization, energy consumption, and privacy. While cloud-based models demonstrate impressive capabilities, they also concentrate power in the hands of few organizations with sufficient resources to train and operate them. Edge-based intelligence that learns and grows locally offers a more democratized alternative.

The Long View

Looking beyond immediate technical achievements, RockAI’s work represents something valuable for AI development: proof that alternative approaches remain viable and potentially superior for specific applications.

In a field moving as rapidly as artificial intelligence, maintaining architectural diversity isn’t just academically interesting — it’s strategically essential. The challenges that seem intractable with current approaches might yield to completely different architectures and paradigms.

“If we look beyond this week’s new model releases and benchmark rankings,” reflects Yang Hua in the Qbitai interview, “taking a ten or even thirty-year perspective on today’s developments, perhaps the light that truly illuminates this deep night of AI competition won’t be the brightest flame burning now, but something that later becomes recognized as the spark that started everything.”

Originally published on Medium.