blog

Intelligence Untethered: The Hardware Bottleneck Blocking Physical AI

Published on

March 23, 2026

The first wave of AI lived on screens. Chatbots, image generators, coding assistants—intelligence confined to the digital realm, accessed through keyboards and touchscreens. That era isn't ending, but a second wave is arriving alongside it. And this one has legs. Literally.

‍

At CES 2026, Jensen Huang declared that "the ChatGPT moment for physical AI is here—when machines begin to understand, reason, and act in the real world." It wasn't hyperbole. In 2025, Figure AI's humanoid robots completed an 11-month deployment at BMW's Spartanburg plant, running 10-hour shifts, handling over 90,000 parts, and contributing to the production of more than 30,000 vehicles [Figure AI]. Agility Robotics' Digit secured commercial agreements with GXO, Amazon, and Toyota Motor Manufacturing Canada [Agility Robotics]. Waymo's robotaxis expanded across multiple U.S. cities [Benzinga]. More than $6 billion in venture capital flowed into robotics and physical AI companies in the first seven months of 2025 alone [SiliconANGLE].

‍

Bank of America calls it a trillion-dollar transformation. Citi Research sees it at an inflection. Goldman Sachs reports that humanoid manufacturing costs dropped 40% between 2023 and 2024 [Deloitte]. The physical AI market—valued at roughly $5 billion in 2025—is projected to exceed $49 billion by 2033, growing at over 32% annually [SNS Insider].

‍

The numbers are impressive. The ambition is staggering. And the hardest problem isn't the AI models. It's where they have to run.

‍

The World Is Not a Data Center

‍

Physical AI is a new class of AI deployment use cases for one simple reason: the computer has to move.

‍

In a data center, the AI system is power-constrained—as we discussed in our previous post—but at least it's plugged into the grid. It sits in a room with climate-controlled cooling, redundant power, and the option to add more GPUs when a model demands it. The constraints are real, but they're engineering and economic problems with known solutions on known timescales.

‍

A robot on a warehouse floor, a drone in the sky, or an autonomous vehicle on a freeway has none of that.

‍

A delivery robot with a 1 kWh battery and a total system power budget of 50–100 watts faces a zero-sum tradeoff: every watt consumed by neural network inference is a watt not available for locomotion. An agricultural inspection drone running multi-spectral imaging and crop analysis models has perhaps 20–30 minutes of flight time; dedicating significant power to AI processing cuts that window directly. Even for vehicles — the most power-rich physical AI platform — a University of Michigan and Ford study found that autonomous driving equipment increases onboard power consumption by 3–5%, with larger sensor packages pushing higher [IEEE Spectrum]. For smaller platforms like drones and robots, where the compute-to-locomotion power ratio is far less favorable, the impact is already much more severe. And as AI workloads scale toward more capable reasoning models, these fractions will only grow. Across every physical AI form factor, the equation is the same: compute competes with the mission.

‍

Further, this competition can create a virtuous—or vicious—cycle for your system. Let's say, generously, that only 50% of your energy budget goes to AI compute. These processors are mostly running AI algorithms. The compute demands of these algorithms have grown more than 10x over the last few years. Reducing the power consumption for an equivalent workload means you can shrink the battery and heat sinks. That dramatically reduces weight, which means smaller motors with less self-loading, giving you multiplicative efficiency benefits on locomotion, battery size, and total system mass. Now imagine the opposite chain of causality—the vicious cycle—which is where we currently stand.

‍

And while power is the first constraint, thermal management is often the binding one.

‍

There are no server-room cooling towers in a drone's airframe, a robot's torso, or a vehicle's compute module. The thermal design has to fit within the form factor of the system—compact, sealed, often passively cooled. A chip that can burst to 100 watts for a few seconds but must throttle to 15 watts sustained is useless for continuous perception workloads. Whether it's a warehouse robot running 10-hour shifts in an un-air-conditioned facility or an autonomous vehicle fleet operating in Phoenix summer heat [Ampere], thermal constraints dictate what the AI can actually do in sustained operation.

‍

And then there's the constraint that digital AI barely thinks about: reliability. A cloud model can tolerate occasional failures—if a request times out, the system retries. A drone conducting infrastructure inspection over a bridge cannot tolerate a gap in its obstacle avoidance system. A surgical robot cannot pause mid-procedure to wait for a cloud response. An autonomous vehicle at highway speed cannot accept 500 milliseconds of perception blackout. Physical AI systems require not just average performance but guaranteed worst-case latency. There is no "retry" when the physical world is moving in real time.

‍

These constraints have driven physical AI platforms toward increasingly heterogeneous architectures—CPUs, GPUs, NPUs, and domain-specific accelerators working in concert, with platforms now dedicating 25% or more of their silicon specifically to neural network acceleration [GSA]. This has become the standard approach from automotive SoCs to drone flight controllers to industrial robot brains.

‍

The problem is that the AI workloads these systems need to run are evolving far faster than the hardware designed to support them. In conversations with leading drone and robotics companies, we have heard a consistent message: the AI share of onboard compute is growing exponentially relative to everything else. Navigation, communication, motor control—those workloads are essentially flat. It's the perception, planning, and reasoning pipelines that are consuming an ever-larger fraction of the power and compute budget, with no sign of slowing down.

‍

The Reasoning Revolution Hits the Real World

‍

Five years ago, a typical physical AI perception stack used convolutional neural networks with tens of millions of parameters, processing 720p video feeds. The job was relatively straightforward: detect objects, classify them, and estimate distances. Advanced, certainly. But fundamentally, pattern matching.

‍

The emergence of Vision-Language-Action (VLA) models represents a significant shift in how machines interact with the physical world. These aren't just perception systems—they're foundation models that see, understand natural language, reason about their environment, and generate physical actions, all within a unified architecture. Google DeepMind's Gemini Robotics, NVIDIA's GR00T N1, Figure AI's Helix, Physical Intelligence's π0—these models combine vision transformers, large language model backbones, and action decoders into systems with hundreds of millions to billions of parameters [VLA-Wikipedia].

‍

Consider what a modern physical AI system must do. A drone surveying a solar farm needs to fuse camera, thermal, and LiDAR data in real time; detect and classify panel defects at high speed; adjust its flight path dynamically for obstacles and wind conditions; and make on-the-fly decisions about which areas need closer inspection—all within a 15-watt power budget and a 25-minute flight window. An autonomous vehicle must build a 3D world model from a dozen sensors, predict the trajectories of every nearby agent, and update it all at 10–30 frames per second [NVIDIA-Thor]. A warehouse robot navigating alongside human workers has to continuously map its environment, anticipate human movement, and replan its path in milliseconds.

‍

But perception is just the beginning.

‍

A warehouse robot encountering an unexpected obstacle shouldn't just stop but reason about alternative paths, predict the likely behavior of nearby humans, consider whether the obstacle is likely to move, and make an intelligent decision in real time. A humanoid robot on a factory floor needs to interpret a spoken instruction—"move that bin to the loading dock, but watch for the forklift"—and translate it into a coordinated sequence of whole-body actions. Consider a delivery drone arriving at a residence: the flight plan says land in the driveway. But there's a truck parked there. The drone can't just abort—it needs to reason about an alternative landing zone, evaluate the front lawn, and assess whether there are children playing, a dog running loose, or a sprinkler head that could damage the package. That's not perception. That's judgment.

‍

This is what VLA models enable. And beyond VLAs, our fundamental conviction – shared by many in the industry – is that physical AI requires a new class of AI architectures compared to chatbots and image generators. World models—systems designed to learn how reality works rather than predict the next word—are attracting some of the largest funding rounds in AI history [TechCrunch-AMI]. Leading researchers argue that language-based approaches alone will never produce machines capable of reasoning about the physical world. The capital flowing into this space signals that the industry expects physical AI compute demands to get dramatically more intense, not less.

‍

And this is why the compute requirements have already exploded.

‍

Recent research characterizing VLA workloads on NVIDIA's Jetson Orin and Thor edge platforms found that up to 75% of end-to-end inference latency is consumed by the memory-bound action generation phase [Sartor-VLA]. The models need to scale to 10–100 billion parameters for true general-purpose capability, yet safe manipulation in physical environments requires a consistent control frequency of at least 10–20 Hz. Today's edge accelerators—even the best ones—are structurally ill-equipped for the sparse, memory-bound autoregressive processing that VLA action generation demands.

‍

VLAs, though, are only part of the picture. In practice, a physical AI system doesn't run a single monolithic model—it runs a heterogeneous blend of workloads. Alongside the large reasoning models sits a long tail of smaller, high-frequency perception functions: vision pipelines for object detection, audio processing for environmental awareness, sensor fusion CNNs, and safety monitoring loops. These models are individually lightweight but run at much higher frequencies—often hundreds of hertz—and collectively consume a significant share of the compute and power budget. The real workload profile of a humanoid robot or a delivery drone isn't one big model. It's dozens of models of varying sizes, running concurrently, at different frequencies, all competing for the same constrained resources.

‍

This means the hardware solution can't be optimized for just one class of model. It needs to be efficient across the full spectrum—from the billion-parameter VLA reasoning at 10 Hz to the small CNN running object detection at 200 Hz.

‍

In other words, the AI models are ready for the physical world. The hardware to run them at the edge isn't.

‍

The Compute Gap At the Edge

‍

To understand the gap in the hardware, we need to look beyond the impressive demos and billion-dollar funding rounds. Most cutting-edge physical AI still depends on cloud connectivity.

‍

For example, Google DeepMind's Gemini Robotics uses a "two-brain" architecture where the heavy reasoning model runs on offboard servers while only rudimentary control loops run locally [Gemini-Robotics]. This works very well for lab demonstrations but is unworkable for fully autonomous systems operating in the real world, where latency is measured in milliseconds and network connectivity is never guaranteed.

‍

The industry knows this. That's why there's an intense push toward on-device inference—Google released Gemini Robotics On-Device specifically to address this gap [Gemini-OnDevice]. Hugging Face's SmolVLA showed that a compact 450-million-parameter model can match the performance of much larger VLAs [SmolVLA]. The race to compress, quantize, and optimize these models for edge deployment is well underway.

‍

But while optimization can help, the fundamental problem isn't that the models are too big but that the hardware architecture is wrong for the workload.

‍

The Memory Wall—Again

‍

Every time a neural network performs an operation on a conventional processor, data must travel from memory to the compute unit and back. This journey—seemingly trivial—is what dominates the energy budget and latency of AI inference. As I've written about on AI Afterhours, the primary challenge for AI computation isn't the math—it's the energy cost of moving data [AI Afterhours - Inference]. In a data center, that's an efficiency problem. In a physical AI system running on a battery, it's existential. This "Memory Wall" becomes the defining constraint.

‍

The arithmetic is brutal. Moving a single byte of data from off-chip DRAM consumes roughly 200x more energy than performing an 8-bit multiply-accumulate operation. A VLA model running continuous inference at 20 Hz on an edge platform doesn't run out of compute—it runs out of memory bandwidth. The processors sit idle, waiting for data to arrive.

‍

This is why the recent characterization of VLA workloads found the action generation phase so overwhelmingly memory-bound [Sartor-VLA]. The bottleneck isn't floating-point throughput. It's data movement. And no amount of process node shrinkage or architectural optimization within the conventional compute paradigm will solve it, because the problem is structural: memory and compute are in different places, and the physics of moving data between them imposes a floor on latency and energy consumption.

‍

This is another way Physical AI is uniquely different from a data center, where you can paper over memory bandwidth limitations with parallelism—spread the model across multiple GPUs, use high-bandwidth interconnects, and accept higher power consumption. On a robot or in a vehicle, you have none of those luxuries. You have a fixed power budget, a fixed thermal envelope, and a hard real-time deadline. The Memory Wall isn't just a performance issue here. It's a deployment blocker.

‍

The inspection drone that could run a state-of-the-art VLA on-device within a 10-watt envelope would transform infrastructure monitoring. The humanoid robot that could perform real-time whole-body reasoning without cloud connectivity would unlock use cases that today exist only in research labs. The autonomous vehicle that could run frontier perception and planning models entirely on chip would finally close the gap between demo and deployment.

‍

These aren't capability problems. They're efficiency problems. And they require a fundamentally different approach to computation.

‍

What If Compute Minimized Data Movement?

‍

The premise of conventional computing is that data lives in one place and gets processed in another. What if you could minimize—or eliminate—that movement entirely?

‍

At EnCharge AI, we've built our answer around a concept we call Virtualized In-Memory Compute™. The name is deliberate—and the analogy matters. Decades ago, the computing industry faced a seemingly intractable problem: how to manage physical memory across applications that each assumed they had the whole machine to themselves. The answer was virtualized memory—an abstraction layer that gave software a seamless, unified address space regardless of where data physically resided. It was one of computing's most consequential innovations, and it reshaped computing. Virtualized In-Memory Compute applies the same principle to AI computation: rather than locking inference to a single level of the memory hierarchy, VIMC enables compute-in-memory across the full stack, abstracting away where computation physically happens.

‍

To understand why this matters, consider the memory hierarchy itself—a series of concentric rings around the processor. At the innermost ring sits SRAM—small, fast, and tightly integrated on-chip. The next ring out is on-chip L2 memory—larger, but more energy-expensive to access. Further out sits off-chip DRAM, which itself comes in different flavors optimized for different tradeoffs: high-bandwidth variants for throughput-hungry workloads, high-capacity variants for larger models, and so on. Each step outward trades latency and energy for capacity. For the most power- and thermally-constrained physical AI systems, the inner rings are where the action has to happen—but as platforms scale up in capability, the ability to extend compute-in-memory to the outer rings becomes equally important.

‍

VIMC brings computation into those rings. At the innermost level, SRAM-based in-memory compute places neural network weights directly in the SRAM cells themselves—the lowest-latency, lowest-energy level of the hierarchy. This is where the most latency-critical inference happens: perception loops, real-time control decisions, safety-critical responses that can't wait for data to travel off-chip. Moving outward, VIMC extends through on-chip L2 memory for mid-sized models, to DRAM integration for the larger reasoning models that bigger platforms demand.

‍

And critically, VIMC isn't just about embedding compute at each level—it's about redesigning the memory system hierarchy itself. The conventional hierarchy was designed for general-purpose computing, not for the sustained, structured data flows of neural network inference. VIMC rethinks both the compute and the interconnect between those SRAM-based compute rings, enabling high-efficiency, high-bandwidth data paths that the traditional architecture was never built to provide.

‍

The physics that makes this work is capacitor-based analog computing, fabricated in standard CMOS. Model weights are stored in standard SRAM cells, and capacitors fabricated above each cell perform the computation directly on those weights. When inputs arrive, the physics of the circuit executes the multiply-accumulate operations in place—no reading weights out, no moving data across the hierarchy. No Memory Wall. Why capacitors? The physics of capacitors in standard CMOS technologies is intrinsically and exceedingly precise—demonstrated over decades in extreme-precision 20-bit analog-to-digital converters used in MRI machines, aircraft, and automobiles. Our innovation was figuring out how to harness this precision for AI compute, giving us a fundamentally different reliability profile than alternatives based on resistive memory or phase-change materials. And because it builds on mature CMOS manufacturing, it integrates naturally with existing semiconductor supply chains.

‍

Why does all of this matter for physical AI? Because, as we've seen, these systems don't run a single model—they run dozens concurrently, from billion-parameter VLAs to lightweight perception CNNs, across a range of frequencies and latency requirements. And they span platform scales: from milliwatt drone sensors to multi-watt robot brains to the more powerful compute modules in autonomous vehicles. A solution locked to one level of the hierarchy, or optimized for one class of model, is a niche. VIMC is designed for the full spectrum. A solution locked to one level of the hierarchy, or optimized for one class of model, is a niche. VIMC is designed for the full spectrum — and critically, it's programmable. Physical AI's heterogeneous workload profile demands an architecture that can efficiently execute any model topology, not one hardwired for a single class of computation. The same fundamental physics applies at every level—whether you're running a small CNN at 200 Hz on a drone or a multi-billion-parameter VLA at 10 Hz on a warehouse robot, the architecture adapts.

‍

For physical AI, this translates directly. The drone that today must choose between flight time and AI capability could run sophisticated reasoning models without sacrificing its mission window. The robot that requires 30 watts for AI processing could run the same models at 3 watts—or run 10x more capable models at the same 30 watts. The vehicle that currently offloads reasoning to the cloud could perform it entirely on-device, with lower latency and higher reliability.

‍

The Race Isn't About Building Better Robots

‍

The remarkable thing about the current moment in physical AI is how fast the software is advancing. VLA models can control humanoid robots through natural language. Perception systems can build real-time 3D world models from multi-sensor fusion. Planning algorithms can navigate increasingly complex, dynamic environments. And every quarter, the models get more capable — and more computationally demanding.

‍

The hardware isn't keeping pace. The models are advancing faster than the compute architectures can absorb them, and the gap is widening.

‍

The companies that solve this — that deliver the compute efficiency needed to run frontier AI models within the power, thermal, and reliability constraints of physical systems — will enable the entire physical AI ecosystem to scale. Not incrementally. Categorically.

‍

The robots are ready to be smarter. The drones are ready to be smarter. The models are there.

‍

The silicon needs to catch up.

‍

Shwetank Kumar is Chief Scientist at EnCharge AI.

‍

References

[Figure AI] Figure AI. "F.02 Contributed to the Production of 30,000 Cars at BMW." November 2025. https://www.figure.ai/news/production-at-bmw

[Agility Robotics] Agility Robotics. "Agility Robotics Announces Commercial Agreement with Toyota Motor Manufacturing Canada." February 2026. https://www.agilityrobotics.com/content/agility-robotics-announces-commercial-agreement-with-toyota-motor-manufacturing-canada

[Benzinga] Benzinga. "15 Physical AI Stocks to Watch in 2026." February 2026. https://www.benzinga.com/markets/tech/26/02/50430658/

[SiliconANGLE] SiliconANGLE. "Beyond Automation: Physical AI Ushers in a New Era of Smart Machines." December 2025. https://siliconangle.com/2025/12/28/beyond-automation-physical-ai-ushers-new-era-smart-machines/

[Deloitte] Deloitte. "AI Goes Physical: Navigating the Convergence of AI and Robotics." December 2025. https://www.deloitte.com/us/en/insights/topics/technology-management/tech-trends/2026/physical-ai-humanoid-robots.html

[SNS Insider] SNS Insider. "Physical AI Market Size, Share & Growth Report 2033." 2025. https://www.snsinsider.com/reports/physical-ai-market-9007

[IEEE Spectrum] IEEE Spectrum. "Exposing the Power Vampires in Self-Driving Cars." June 2021. https://spectrum.ieee.org/exposing-the-power-vampires-in-self-driving-cars

[Ampere] Ampere Computing. "Avride Adopts Ampere to Power Its Autonomous Vehicle Technology." 2025. https://amperecomputing.com/blogs/avride-adopts-ampere-to-power-its-autonomous-vehicle-technology

[GSA] GSA. "Edge AI Computing Advancements Driving Autonomous Vehicle Potential." 2023. https://www.gsaglobal.org/forums/edge-ai-computing-advancements-driving-autonomous-vehicle-potential/

[VLA-Wikipedia] Wikipedia. "Vision-Language-Action Model." 2026. https://en.wikipedia.org/wiki/Vision-language-action_model

[NVIDIA-Thor] NVIDIA. Drive Thor Platform. https://www.nvidia.com/en-us/self-driving-cars/

[Sartor-VLA] Sartor et al. "Characterizing VLA Models: Identifying the Action Generation Bottleneck for Edge AI Architectures." arXiv, 2026. https://arxiv.org/html/2603.02271

[Gemini-Robotics] Google DeepMind. "Gemini Robotics 1.5." arXiv, 2025. https://arxiv.org/abs/2510.03342

[Gemini-OnDevice] Wikipedia. "Vision-Language-Action Model—Gemini Robotics On-Device." 2026.

[SmolVLA] Hugging Face. SmolVLA. 2025.

[TechCrunch-AMI] TechCrunch. "Yann LeCun's AMI Labs Raises $1.03 Billion to Build World Models." March 2026. https://techcrunch.com/2026/03/09/yann-lecuns-ami-labs-raises-1-03-billion-to-build-world-models/

[AI Afterhours - Inference] Shwetank Kumar. "Beyond Raw Speed: The Multi-Dimensional Chess Game of AI Inference." AI Afterhours, April 2025. https://aiafterhours.substack.com/p/beyond-raw-speed-the-multi-dimensional

‍