top of page
Search

The Memory Wall: How Computational Storage is Powering the Future of AI Infrastructure

  • Writer: James Garner
    James Garner
  • 1 day ago
  • 6 min read


In the race toward artificial general intelligence (AGI), we often focus on the headline-grabbing advancements in GPU technology and model capabilities. But behind every successful AI system lies a critical yet often overlooked component: storage. As JB Baker, Vice President of Marketing and Product Management at ScaleFlux, explained when he joined us on the Project Flux podcast, the true bottleneck in AI infrastructure might not be where you think it is.


"Processors, and particularly GPUs, have grown exponentially in their ability to churn through data. Memory and storage have also grown exponentially in their ability to deliver data. However, the coefficient on that exponent has been a lot higher on the CPUs," Baker explains. The result? A growing gap between how much data processors can handle and how much data storage can deliver.


This gap is what industry experts call "the memory wall," and it's becoming one of the most significant challenges in scaling AI infrastructure. Baker offers a vivid analogy: "Your processors are your field, and you have seeded them. You have a giant lake full of water. That's all your data. And you have plenty of it. But the only thing you have to water that entire field is a little sprinkler can."


The Overlooked Bottleneck in AI Infrastructure

When organisations embark on AI initiatives, there's often a rush to acquire the latest GPUs without considering the entire infrastructure needed to support them. Baker has observed this firsthand: "One system guy that I was talking to is like, 'Hey, yeah, my customers just came in and they're like, we need the GPU cluster and let's get a hold of the GPUs and let's put all our money there.' And they don't think about the storage."


This oversight can lead to expensive GPUs sitting idle, waiting for data to process. It's like having a Formula 1 race car stuck in traffic—all that power and nowhere to go.


What makes AI workloads particularly challenging is that, unlike traditional data analytics, they often generate more data than they consume. "With AI, you pull in all this data, you turn through it, and you may spit out more data that has to be stored than what you brought in," Baker notes, referencing a Microsoft presentation at the Open Compute function. This creates additional demands on storage systems that many organisations fail to anticipate.


The energy costs are substantial too. A ChatGPT query can consume ten times the energy of a traditional Google search. One AI image generation can use as much energy as charging a smartphone. These costs multiply rapidly at scale, making efficient infrastructure critical.


Breaking Down Computational Storage

So how do we address this bottleneck? This is where computational storage enters the picture. Baker explains: "The whole concept there is to say, can we make the data pipeline and storage in particular more efficient by moving some tasks out of the central processor, whether that's an x86 or a GPU, and distributing it down closer to where the data lives?"


ScaleFlux's approach involves integrating hardware-based compression and decompression engines directly into the controller in the drive. This might sound like a small technical detail, but the efficiency gains are substantial.


"Doing compression in a CPU is extremely inefficient," Baker explains. "If you run GZIP, which is a heavier compression in your x86 processor, you might get a few megabytes per second of throughput. Or in the newer server processor classes, maybe it's tens of megabytes per second." In contrast, a hardware-based solution can achieve "many gigabytes per second of throughput."


Implementing this technology wasn't without challenges. Traditional storage systems use fixed block lengths—standardised "bricks" of data, typically 4 kilobytes each. But when compression is applied, these blocks become variable in size. "We don't get that one single same-sized Lego brick," Baker explains. "We get a three by two, we get a one by two, we get a two by two, we get all these different sizes." This required developing sophisticated firmware to track not just where each block was stored, but how big it was.

The payoff is significant: more efficient use of storage, faster data delivery to processors, and ultimately, better performance from those expensive GPUs.


The Sustainability Challenge of AI Infrastructure

Beyond performance, there's another critical dimension to AI infrastructure: sustainability. The energy consumption of data centres is growing at an alarming rate, with significant implications for our planet.


"At the global level, data centres are already around 2% of global power consumption," Baker reveals. "And over the past four years, we added, just for data centre consumption, the entire country of France. The equivalent of the entire country of France was added to the global grid over the last four years."


While global power consumption outside of data centres grows at about 1-2% per year, data centres are growing at 12-16% annually—six to eight times faster. In some regions, the concentration is even more dramatic. "Ireland and Northern Virginia... data centres already consume over 20% of the total electrical generation in those regions," Baker notes.


This growth is creating challenges for utilities. "Dominion in Northern Virginia and APS down in Arizona and others... they've already said, we can't bring on new power generation capacity as fast as you guys on your data centres want to scale. So those data centre projects are getting delayed."


In response, major tech companies are taking matters into their own hands. Microsoft is bringing back one of the reactors at Three Mile Island, while Sam Altman is investing in nuclear fission and solar energy. These companies are planning for "adjacent or co-located power plants" alongside their data centers, though this approach comes with regulatory challenges and "NIMBY" (Not In My Back Yard) concerns.


Cooling technologies are evolving too, moving from traditional air cooling with fans to more efficient liquid cooling and immersion cooling, where entire servers are submerged in non-conductive liquid. But even these solutions create new challenges, such as water consumption and the need for data centres with sufficient structural integrity to support the weight of liquid vats.


The Future of AI Infrastructure

Looking ahead, Baker sees a future of increasing specialisation and fragmentation in AI infrastructure. "Instead of just being general purpose GPUs that are used for the AI processing, the Blackwells and all the NVIDIA processors, you will see specialised processors," he predicts.


Meta and potentially OpenAI are developing their own GPUs optimised specifically for large language models. Edge AI processors will be optimised for inferencing rather than training. This specialisation means organisations need to carefully consider their specific AI objectives before investing in infrastructure.


Baker compares NVIDIA's current dominance to Intel's position 20-30 years ago. "The Intel x86 architecture was just the beast. The 800-pound gorilla in the room." While specialized processors existed that were better at specific tasks, Intel became "a manufacturing machine and the architecture of choice." NVIDIA is following a similar path, building out an entire ecosystem of software and tools around their hardware.


Another emerging technology that excites Baker is CXL (Compute Express Link), which allows memory to be attached to the PCIe bus rather than directly to the processor. This enables greater memory capacity and bandwidth, as well as sharing of memory between processors in a cluster. "It's like getting rid of a little country lane and giving it a motorway, or freeways," as Baker puts it.


Planning for AI Success

As organisations navigate these complex infrastructure challenges, Baker offers some straightforward advice: "Make sure that you have an objective plan for it. What is it you're trying to get out of this AI? And then make sure you look at the full series of impacts that you're going to have by deploying it."


Those impacts extend beyond just the immediate hardware needs: "What is it going to do to memory? What is it going to do to networking? What is it going to do to storage? How am I going to share this data across there with my data orchestration to get the most out of it?"


For data center planners, there are additional considerations around cooling, energy efficiency, power delivery, and even structural integrity. Many existing data centers can't be retrofitted for the latest cooling technologies because they weren't designed to support the weight.


Power density is another challenge. As Baker explains, "When they built out and planned for however many thousands of square feet of floor space, well, the chips were only consuming 100 watts or 200 watts each. Now those processors are consuming 1,200 watts. So you can only fill one-sixth of your data center floor space before you've consumed all the power."


The path to effective AI infrastructure isn't just about buying the latest GPUs—it's about taking a holistic view of the entire data pipeline, from storage to memory to processing and back again. By addressing bottlenecks like the memory wall, organizations can ensure they're getting the most value from their AI investments while building sustainable systems for the future.



Want to stay updated on the latest in AI infrastructure and project management? Subscribe to the Project Flux newsletter for weekly insights, tools, and expert perspectives. To learn more about ScaleFlux and their computational storage solutions, visit scaleflux.com .

 
 
 

コメント


bottom of page