Platform Support

NVIDIA RTX Spark.
ARM-native LLM inference,
day one.

The RTX Spark Superchip (codenamed N1X) pairs a 20-core Grace CPU with a Blackwell RTX GPU on a single die, sharing 128 GB of LPDDR5X over NVLink C2C. BareMetalRT is built on NVIDIA's own CUDA + TensorRT-LLM stack, so the 1,500+ hand-optimized CUDA kernels that power BareMetalRT on desktop run unchanged on RTX Spark. No port, no rewrite, no waiting.

NVIDIA RTX Spark Superchip — Grace CPU + Blackwell RTX GPU on a single die

NVIDIA RTX Spark Superchip · Computex announcement

GPU

Blackwell RTX6,144 CUDA cores · 1 PetaFLOP FP4 AI performance

CPU

20-core GraceCustom-built by NVIDIA × MediaTek

Unified Memory

128 GB LPDDR5X300 GB/s — the model fits where the system has RAM

Interconnect

NVLink C2C600 GB/s between CPU and GPU — 5× PCIe Gen 5

Software Stack

CUDA · TensorRT · NVFP4Plus DLSS, Ray Tracing, Reflex, G-SYNC

BareMetalRT support

Day oneSame installer, same daemon, same engine cache

Specifications above are from NVIDIA's Computex announcement of the RTX Spark Superchip. BareMetalRT's day-one support claim rests on the published NVIDIA software stack (CUDA, TensorRT) running unchanged on RTX Spark — not on speculative hardware specs.

Why BareMetalRT on RTX Spark

Same CUDA stack — zero porting

RTX Spark exposes the NVIDIA software stack natively. BareMetalRT ships 1,500+ hand-optimized CUDA kernels for attention, GEMM, and quantization through TensorRT-LLM. None of that is x86-specific. When RTX Spark ships, our engine ships with it.

Full-precision results

The accuracy that beats every other consumer inference setup — results that match a single GPU's output when a model is split across cards — isn't tied to a specific GPU SKU. It works wherever NVIDIA's CUDA cores work. That includes RTX Spark.

128 GB unified memory changes the math

RTX Spark ships with 128 GB of LPDDR5X shared between CPU and GPU over a 600 GB/s NVLink C2C bridge. A single Spark machine can hold a 70B-class model in one address space — no offload, no swap, no quantization compromises. BareMetalRT's KV cache + paged-state plumbing maps onto that directly.

Mesh-ready from day one

A single RTX Spark runs inference solo. Two of them on a home network run TP=2 over BareMetalRT's network transport — no NVLink between machines, no matched hardware, no Linux. The same heterogeneous tensor parallelism that lights up a 4070 + 4060 today lights up two Spark machines tomorrow.

Available today on desktop · ready for RTX Spark

Run frontier models on the hardware you already own.