Latest News

OXMIQ CEO Raja Koduri on Re-Architecting the GPU Stack: From Atoms to Agents

Feb 6, 2026

By Thomas Eugene Green

One of the things I value most about working at Oxmiq is the opportunity to learn from Raja Koduri. Recently, he sat down with Salah Nasri on the Semiconductor Leadership Podcast, and the talk felt like a masterclass on the business strategy behind Oxmiq Labs. As I listened to the conversation, it quickly became clear that rather than making incremental changes, he’s focused on making structural shifts that will reshape how the entire industry builds AI systems.

The 2-hour talk is well worth your time. You can find it on Spotify or YouTube. Below are the highlights that stood out to me, and why I think the work at OXMIQ matters.

The Problem - AI Infrastructure is Hitting a Wall

Raja opened by framing the scale of this challenge. A single hyperscale AI data center now costs approximately $50 billion. The silicon budget alone for buying the most popular GPU is approximately $30-35 billion per gigawatt, and the energy, power delivery, networking, cooling, etc. (depending on the geography you're in), is another $10 to $15 billion. As AI demand accelerates, these costs will also rise, which means the industry can't build fast enough to meet demand.

Solving this requires more than faster chips. Raja emphasized that utilization or getting more useful work out of every transistor, will determine the winners and losers over the next five years. For AI to reach the rest of the world, silicon costs need to come down by a third or more. That's part of why OXMIQ exists: to enable a broader set of players to build competitive chips and chiplet systems, which will drive down costs the same way mobile internet became affordable for a billion people in India.

OxCore - a unified architecture

At the center of our platform is OxCore. Raja describes as "the new computing core." It encapsulates scalar, vector, and matrix computing into one unified architecture. Today, most systems treat these as separate abstractions: CPU-style processing (scalar), GPU-style parallel processing for vectors (like CUDA), and tensor/TPU-style matrix math. OxCore unifies all three. As Raja put it, "You can think of it as a single core that encapsulates CPU, GPU, and TPU into one unified architecture. This design philosophy improves utilization and simplifies how developers think about execution.

Chiplet quilting and the need for real standards

Raja mentioned that chiplets are often discussed as the future of scalable silicon, but as he put it, "Everyone who is super excited about chiplets are the ones who haven't done them yet." He's earned invaluable experience with this from driving chiplet integration from AMD’s HBM1 all the way through Intel's 47-chiplet Data Center GPU Ponte Vecchio.

The lesson? Standards alone won't solve the problem. As Raja said, "You doing a chiplet and me doing a chiplet and expecting them to come together and work, if you just have a standard? No, no, no. Even within my own team, (at Intel) there were challenges." High-bandwidth chiplet systems introduce complex power, thermal, and validation challenges. The higher the bandwidth between two chiplets, the harder it is to standardize, because the heat you generate becomes problematic.

Our chiplet quilting approach recognizes that plug and play integration requires more than innovation. It demands standardization, disciplined execution, and continual validation. That's what we are trying to create to ensure we can align with our partners on architecture, power, thermals, and testing across the entire stack.

From Agents to Atoms

This part of the conversation stuck with me most. Raja called it "probably the most visionary or most profound thing" in the entire discussion.

The idea is that there are layers of abstraction (programming languages, frameworks, drivers, runtimes, etc) that sit in between an AI agent generating work and the silicon executing. Those layers are built by humans, for humans, but agents aren't humans, so couldn't they communicate directly to silicon compute blocks?

As Raja puts it, "They don't need to talk through Python, C, all these intermediate languages. We created them for humans to program. But when it's an agent generating work, there will be new, more efficient forms of communication, where the agent can talk to what I call nano-agents in silicon directly."

He went on to say, "You can express an entire inference model in a single page of math equations. Why am I breaking that down into tens of thousands of lines of code, through all the layers of the stack, just to ask the atoms to wiggle and perform a math computation? What if the future hardware just talks math?"

This is the long-term direction OXMIQ is architecting toward. We believe that over time, those layers will thin as systems evolve to express intent more directly to hardware. Fewer translations mean lower latency, less energy lost, and higher efficiency. When asked what one thing listeners should remember about OXMIQ, Raja's answer was simple: "Think agents to atoms."

Why does Oxmiq license GPU IP?

Unlike traditional fabless companies that design and sell their own chips, OXMIQ provides a licensable IP so others can build chips that are tailored to their specific needs. Raja stated this clearly, "There is ARM for CPUs. But there is not ARM for GPUs. Anyone can license our IP and build a chip. That's the problem we're trying to solve."

The reality is that no single chip fits every workload. "Apple's constraints on an iPhone were different than other smartphones, power, thermal, sensors. Me doing a single chip that can satisfy everybody is not going to happen. Even Nvidia can't cover the entire range."

Licensing IP allows partners to build silicon that’s tuned to their own requirements, OxCore can scale from edge to datacenters, robotics to automotive, all while benefiting from a common architecture and software ecosystem. Raja gave this analogy: "What happened in the mobile space? Because ARM was available, Apple was able to design their own chip. Imagine if the only CPU company was Intel. The iPhone would have never happened."

OxCapsule beta: reducing developer friction

Raja also discussed our first software product, OxCapsule, which is a product that we use every day and can't live without. We launched the public beta back in November 2025 with V1.0 for Windows and Mac, and released V2.2 with Linux client support in December 2025. We're currently releasing updates monthly.

The goal of OxCapsule is to give developers easy access to continuous compute across GPUs, accelerators, and CPUs. With OxCapsule, developers connect to remote GPU environments through a single interface and shift between hardware platforms without rebuilding their workflows.

So far, the beta has attracted participation from companies including ARM, AMD, Intel, Infineon, Global Foundries, Tenstorrent, Radisys, and others, as well as universities like Boston University, NYU, Texas A&M, IIT Hyderabad, and University of Utah. Early feedback is helping us refine usability and performance, and it's rewarding to see the system reach real developers.

If you'd like to try OxCapsule, you can join the beta here.

Lessons learned from a career in tech

Finally, Raja spoke about his career experience working at Apple, AMD, and Intel. I think it’s important to mention this because his experience has given him a unique perspective on how compute architecture and software evolve across product cycles, market transitions, and technology shifts. Raja compared different phases of his career to stages of academic training. He described his time at Apple as transformative, saying it was “almost like going to a university,” where he learned to approach problems from a user perspective rather than purely technical metrics. He later described his time at Intel as “the PhD phase" of his career…because it brought everything together, and gave him exposure to the full stack from transistor manufacturing through packaging, systems, and software.

That combination of long-cycle industry experience, systems thinking, and willingness to rethink the stack from the ground up is exactly why I am excited to be working at Oxmiq right now.

Architecting Atoms to Agents™