NVIDIA RTX Spark.
ARM-native LLM inference,
day one.
The RTX Spark Superchip (codenamed N1X) pairs a 20-core Grace CPU with a Blackwell RTX GPU on a single die, sharing 128 GB of LPDDR5X over NVLink C2C. BareMetalRT is built on NVIDIA's own CUDA + TensorRT-LLM stack, so the 1,500+ hand-optimized CUDA kernels that power BareMetalRT on desktop run unchanged on RTX Spark. No port, no rewrite, no waiting.
Specifications above are from NVIDIA's Computex announcement of the RTX Spark Superchip. BareMetalRT's day-one support claim rests on the published NVIDIA software stack (CUDA, TensorRT) running unchanged on RTX Spark — not on speculative hardware specs.
Same CUDA stack — zero porting
RTX Spark exposes the NVIDIA software stack natively. BareMetalRT ships 1,500+ hand-optimized CUDA kernels for attention, GEMM, and quantization through TensorRT-LLM. None of that is x86-specific. When RTX Spark ships, our engine ships with it.
Full-precision results
The accuracy that beats every other consumer inference setup — results that match a single GPU's output when a model is split across cards — isn't tied to a specific GPU SKU. It works wherever NVIDIA's CUDA cores work. That includes RTX Spark.
128 GB unified memory changes the math
RTX Spark ships with 128 GB of LPDDR5X shared between CPU and GPU over a 600 GB/s NVLink C2C bridge. A single Spark machine can hold a 70B-class model in one address space — no offload, no swap, no quantization compromises. BareMetalRT's KV cache + paged-state plumbing maps onto that directly.
Mesh-ready from day one
A single RTX Spark runs inference solo. Two of them on a home network run TP=2 over BareMetalRT's network transport — no NVLink between machines, no matched hardware, no Linux. The same heterogeneous tensor parallelism that lights up a 4070 + 4060 today lights up two Spark machines tomorrow.