Deepinfra Raises $107M for AI Inference Cloud

Deepinfra secures $107 million to expand its dedicated AI inference cloud, targeting open-source models and autonomous agents.

*TL;DR: Deepinfra closed a $107 million Series B round to scale its purpose‑built AI inference cloud, now operating hardware in eight U.S. data centers and serving over 30 % of token traffic from autonomous agents.

Context The AI market is shifting from experimental chatbots to production‑grade, agent‑driven workflows that require constant model calls. Traditional cloud providers, built for bursty workloads, struggle with the latency and cost spikes of such “always‑on” inference. Deepinfra was founded to redesign the infrastructure stack, treating inference as a primary service rather than an afterthought.

Key Facts - The Series B round, led by 500 Global and former Google cloud engineer Georges Harik, brought $107 million from investors including Nvidia, Samsung Next, Supermicro and several venture firms. - Deepinfra runs its own inference hardware in eight U.S. data centers, giving it full control over GPUs, networking and APIs. The company leverages Nvidia’s Dynamo distributed‑inference platform and the latest Blackwell and Vera Rubin GPUs, claiming up to 20 times better cost efficiency than generic cloud options. - More than 190 open‑source models, such as Nvidia’s Nemotron family, are available on the platform. A zero‑data‑retention policy protects enterprise customers who cannot store sensitive data in the cloud. - Autonomous AI agents generate over 30 % of the token volume processed by Deepinfra, highlighting the growing demand for high‑throughput, low‑latency inference.

What It Means By owning the hardware stack, Deepinfra can fine‑tune performance and pricing for workloads that make dozens of model calls per task. This could lower barriers for enterprises adopting agentic AI, where unpredictable latency has previously driven up costs. The influx of capital positions the startup to expand beyond its current U.S. footprint, potentially adding more data centers and supporting a broader range of open‑source models. As open‑source AI approaches parity with proprietary systems, dedicated inference clouds may become a critical layer for scaling production AI.

Looking Ahead Watch for Deepinfra’s next data‑center rollouts and any partnerships that extend its token‑factory model to new regions or verticals.

Deepinfra Raises $107M to Build Dedicated AI Inference Cloud for Open‑Source Models

More in this thread

Google Tops AI Stock Rally as Palantir Beats Forecast and Nasdaq Stays at 25,000

IBM and Oracle Celebrate 40‑Year Partnership with RHEL on OCI and Envizi SaaS Launch

Free AI Workshop for Seniors Set for May 19 at Shady Rest Clubhouse

Reader notes