thunderbolt-ibverbs: We have InfiniBand at home

TL;DR

A developer built a Linux kernel module that makes Thunderbolt ports emulate InfiniBand devices, enabling high-speed RDMA communication between consumer mini PCs. This breakthrough could democratize AI training and inference at home.

A developer has created a Linux kernel module that enables Thunderbolt 4 and USB4 ports on AMD mini PCs to emulate InfiniBand devices, achieving high-speed RDMA communication at home. This development could allow consumer hardware to handle AI training and inference workloads traditionally reserved for enterprise networks. Smart home gadgets could benefit from such high-speed data transfer capabilities.

The project involves experimental RDMA-over-USB4 for two AMD mini PCs, specifically 128GB Strix Halo models, enabling bidirectional data transfer rates of approximately 95 Gb/s with around 7 microseconds of latency. The developer reports that this setup supports tensor-parallel inference and Fully Sharded Data Parallel (FSDP) workloads, such as a MiniMax-M2.7 inference run that exceeds the capacity of a single machine, and a Gemma 3 27B LoRA FSDP step that reduced training time from over 21 minutes to just over two minutes compared to Ethernet.

This was achieved by developing a custom Linux kernel module that makes Thunderbolt ports appear as InfiniBand devices, leveraging RDMA (Remote Direct Memory Access) technology to facilitate rapid data exchange. The setup reportedly sustains around 48 Gb/s per direction, with aggregate performance of about 95 Gb/s, vastly outperforming standard Ethernet and soft-RoCE configurations on Thunderbolt networks. Latency measurements show significant improvements over traditional Ethernet and Thunderbolt-based networking, with one-way latency at about 7 microseconds versus 28 to 65 microseconds in other setups.

Why It Matters

This breakthrough demonstrates that high-performance, low-latency RDMA communication can be achieved on consumer hardware using Thunderbolt ports, potentially democratizing access to AI training and inference capabilities without costly enterprise networking gear. If scalable and stable, this approach could enable hobbyists and small labs to perform distributed AI workloads at home, reducing reliance on cloud services and expensive data center infrastructure.

Amazon

Thunderbolt 4 USB4 high-speed data transfer cable

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Traditionally, high-speed RDMA networks like InfiniBand are confined to enterprise data centers and supercomputers due to their specialized hardware and complex setup. Recent efforts have explored RDMA over Ethernet (RoCE) and soft-RoCE implementations, but these are limited in performance and latency. The developer’s work builds on ongoing research into making RDMA more accessible, leveraging USB4 and Thunderbolt interfaces common on consumer PCs. This project is experimental, with the developer noting it is based on research code with potential false assumptions and sharp edges, and not intended for production use.

“This is experimental research code, most of it AI-generated, and it loads experimental kernel modules on machines I was willing to crash repeatedly.”

— the developer behind the project

“We built experimental RDMA-over-USB4 for 128GB Strix Halo mini PCs. It lets two consumer boxes talk fast enough to run tensor-parallel inference and FSDP workloads across both machines.”

— the developer

Amazon

RDMA compatible Thunderbolt 4 external device

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear how stable and scalable this solution is for long-term or production use, and whether it can be widely adopted across different hardware configurations. Further testing and development are needed to determine its practical viability.

Amazon

high-performance mini PC with Thunderbolt ports

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The next steps involve refining the kernel modules for stability, testing across more hardware setups, and exploring potential integration into consumer operating systems. Broader community engagement and peer review are likely to follow to assess feasibility for wider use.

Amazon

AI training hardware for home use

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Can this setup be used for commercial or production AI workloads?

Currently, no. The project is experimental and not intended for production use. Stability and scalability are still under evaluation.

What hardware is required to replicate this setup?

At minimum, two AMD mini PCs with Thunderbolt 4 or USB4 ports, and the developer’s custom Linux kernel modules. Hardware compatibility and driver support are still being tested.

Does this mean consumers can build their own high-speed networks at home?

Potentially, yes, but current implementations are experimental. Widespread adoption will require further development and stability improvements.

How does this compare to traditional Ethernet or Wi-Fi for AI workloads?

According to the developer, RDMA over Thunderbolt offers significantly lower latency (~7 microseconds) and higher throughput (~95 Gb/s bidirectional) compared to Ethernet or Wi-Fi, which are typically slower and have higher latency.

Source: Hacker News

You May Also Like

15 Best oracle card storage boxes in 2027

When it comes to safeguarding your precious oracle cards, the right storage box can make all the difference. The Trading Card Storage Box with 12-600 Count and the 10800+ Card Storage Box stand out for their capacity and organization features, perfect for enthusiasts with extensive collections. Whet

Arm, the UK and Apple

Analysis of Arm’s sale to Softbank, the UK government’s role, and implications for Apple and the tech industry.

Google floats reduced initial 5GB free cloud storage limit, users claim — 15GB to require extra security measures, company confirms it is ‘testing a new storage policy for new accounts’

Google is reportedly testing a new policy limiting new accounts to 5GB of free storage, down from 15GB, with verification required for full access.

RJ Scaringe has raised more than $12 billion across three startups and investors still want more

Serial entrepreneur RJ Scaringe has raised more than $12 billion for his three startups, including Rivian, Also, and Mind Robotics, highlighting strong investor confidence.