SemiconductorX > Fab & Assembly > Manufacturing Flow > Module Integration > CPU/GPU Boards
GPU & CPU Subsystem Boards
Subsystem boards are the integration layer where packaged CPUs, GPUs, and AI accelerators combine with memory, power delivery, and high-speed interconnects on a multi-layer printed circuit board to form a functional compute engine. The board is what actually plugs into a server, a robot chassis, or a vehicle. It is the assembled product that the downstream system integrator receives and connects into racks, enclosures, or platforms. Subsystem boards are structurally different from multi-chip modules because they operate at package-to-package scale on a PCB rather than die-to-die scale on a substrate. They are structurally different from finished server boards because they present a single coherent compute product — the "SXM module" or "OAM module" or "PCIe accelerator" — that later drops into a system motherboard.
The subsystem board layer is where AI accelerator product definitions happen. NVIDIA's H100 and H200 are not sold as bare chips; they are sold as SXM modules or PCIe cards that include the GPU, HBM, voltage regulators, and NVLink connectors. AMD's MI300 ships as OAM modules. Tesla's Dojo training tile is a subsystem board. The silicon is only the starting point; the subsystem board is what the customer buys and what the competitive architecture decisions are made about. Concentration at this layer tracks silicon concentration because the same three or four companies (NVIDIA, AMD, Intel, with Tesla in custom AI training) that define the silicon also define the boards. Contract manufacturers and ODMs (Foxconn, Quanta, Wistron, Inventec) do the physical build, but the architecture is owned by the silicon company.
What a Subsystem Board Contains
| Component Class | Function | Scale at Top-End AI Boards |
|---|---|---|
| Compute package | CPU, GPU, or AI accelerator — often itself an MCM with advanced packaging | 700 W to 1000+ W TDP per package |
| Memory | HBM integrated on the compute package; or external memory modules on the same board | Multiple HBM stacks at 1+ TB/s per stack |
| Voltage regulator modules (VRMs) | Deliver hundreds of amperes of low-voltage current to the compute package | 1000+ A total; sub-millivolt regulation |
| Passives and clock management | Decoupling capacitors, inductors, clock generation, PLLs | Thousands of discrete passives per board |
| High-speed interconnects | PCIe Gen5/Gen6, NVLink, Infinity Fabric, CXL, OAM interconnect | Multi-terabit aggregate bandwidth per board |
| Thermal solution | Heat spreader, vapor chamber, or direct-to-die liquid cooling plate | Kilowatt-scale heat removal; liquid cooling standard for top-end AI |
| PCB | Multi-layer board with high-speed signal routing and current-carrying planes | 20+ layers with specialty materials (low-loss laminates, buried capacitance) |
Board Form Factors
Subsystem boards have standardized form factors that define mechanical dimensions, connector interfaces, and power/thermal interfaces. Form factor choice determines which systems a board can plug into and how it is cooled.
| Form Factor | Origin / Governance | Typical Use |
|---|---|---|
| NVIDIA SXM | NVIDIA proprietary | NVIDIA data center GPUs (SXM5 for H100/H200, SXM-Next for Blackwell); up to 8 modules per server with NVLink |
| OCP Accelerator Module (OAM) | Open Compute Project standard | AMD Instinct MI300, Intel Gaudi, other non-NVIDIA accelerators; vendor-neutral alternative to SXM |
| PCIe accelerator card | PCI-SIG standard, various lengths (full-length, full-height) | Lower-TDP inference GPUs, networking accelerators, FPGAs; broader server compatibility |
| UBB (Universal Baseboard) | OCP standard baseboard for OAM | 8-OAM baseboards that form the accelerator complex in datacenter servers |
| HGX / MGX | NVIDIA reference designs | Multi-GPU baseboards for NVIDIA SXM modules (HGX); modular rack-level reference designs (MGX) |
| Custom AI training tiles | Vendor-specific (Tesla Dojo, Cerebras wafer-scale) | Custom AI training systems; tightly coupled silicon-and-system co-design |
Representative Boards
| Board | Owner | Composition |
|---|---|---|
| NVIDIA H100 / H200 SXM | NVIDIA | Hopper GPU + HBM3/HBM3E stacks on CoWoS + VRMs + NVLink; 700W TDP |
| NVIDIA Blackwell SXM | NVIDIA | Dual-die Blackwell GPU + HBM3E on CoWoS-L + higher-bandwidth NVLink; ~1000W TDP |
| NVIDIA GB200 Superchip | NVIDIA | Grace CPU + 2× Blackwell GPU on one module; NVLink-C2C between CPU and GPUs |
| AMD Instinct MI300X OAM | AMD | 3D-stacked CPU + GPU chiplets + HBM3; OAM form factor for 8-module servers |
| Intel Gaudi 3 OAM | Intel (Habana Labs) | Custom AI accelerator + HBM; targeting NVIDIA H100 competition at lower cost per FLOP |
| Intel Ponte Vecchio / Max GPU | Intel | Tile-based GPU using Foveros and EMIB + HBM; Aurora supercomputer |
| Tesla Dojo Training Tile | Tesla | 25 custom D1 AI dies in a tile with integrated liquid cooling; 9 petaflops per tile |
| Cerebras Wafer-Scale Engine | Cerebras | Single wafer-scale AI chip (largest chip ever built at each generation); custom cooling and power delivery |
Power & Thermal: The Limiting Factor
The primary engineering constraint on modern AI and HPC subsystem boards is not silicon capability; it is power and thermal density. A top-end AI board draws over 1000 watts and dissipates it over a footprint smaller than a laptop. Three engineering challenges compound.
First, power delivery: delivering 1000+ amps at sub-one-volt regulation requires a voltage regulator module (VRM) complex that itself consumes board area and generates substantial heat. VRM efficiency matters — every percent of inefficiency is tens of watts of additional heat to remove. VRM innovation (vertical power delivery, integrated voltage regulators on the compute package, multi-phase power management) is an active engineering frontier and increasingly a competitive differentiator. Specialty suppliers for high-end VRMs include Infineon, MPS (Monolithic Power Systems), Renesas (Intersil), Alpha and Omega, and Vicor, with growing integration of power management ICs into the accelerator package itself.
Second, signal integrity: high-speed SerDes signaling at 100 Gbps per lane and above requires specialty PCB materials (low-loss laminates like Megtron 7/8, buried capacitance layers), careful stackup design, and sometimes re-timers or signal conditioners on the board. At the bandwidths modern AI boards require, a few millimeters of PCB trace routing choice can determine whether the link works reliably.
Third, cooling: conventional air cooling is near its limit at 700W per package and unworkable at 1000W. Direct-to-die liquid cooling, cold plates integrated into board design, and rack-level liquid distribution are becoming standard for top-end AI deployments. Immersion cooling, long a niche technology, is in deployment at multiple hyperscaler sites. Board design increasingly assumes liquid cooling rather than treating it as an upgrade option. Cooling integration pushes the board-level design further into system co-design — the board cannot be cooled in isolation from the chassis and the rack-level cooling infrastructure.
Interconnects Beyond the Board
Subsystem boards must connect into larger topologies — multiple boards in a server, multiple servers in a rack, multiple racks in a cluster. The board-level interconnect determines what topologies are achievable.
| Interconnect | Role | Typical Reach |
|---|---|---|
| PCIe Gen5 / Gen6 | General-purpose host-to-accelerator and accelerator-to-storage | Within-server; host CPU to accelerator, accelerator to storage and networking |
| NVLink / NVLink Switch | NVIDIA-proprietary high-bandwidth GPU-to-GPU fabric | Within-server (up to 8 or 18 GPUs); NVLink Switch extends to rack-scale |
| Infinity Fabric | AMD-proprietary on-module, socket-to-socket, and accelerator-to-accelerator fabric | Within-module and within-server |
| CXL (Compute Express Link) | Coherent memory and accelerator interconnect over PCIe PHY | Within-server; emerging rack-scale with CXL 3.0 switches |
| InfiniBand / Ethernet (RoCE, Ultra Ethernet) | Rack-to-rack and cluster-scale networking | Multi-rack; cluster-scale AI fabrics (NVIDIA Spectrum-X, Ultra Ethernet Consortium) |
Beyond the Datacenter
Subsystem boards are not a datacenter-only category. The same integration layer — package + memory + power + thermal + interconnect on a PCB — appears in several other domains, with different constraints and different operators building the boards.
| Domain | Representative Boards | Primary Constraints |
|---|---|---|
| Automotive inference | Tesla HW4 / AI5 FSD computer, NVIDIA Drive Thor, Mobileye EyeQ modules, Qualcomm Ride platform | Automotive functional safety (ISO 26262), thermal (no liquid cooling), lifetime reliability, harsh environment |
| Humanoid & robotic control | Tesla Optimus compute board, Figure 02 compute, NVIDIA Jetson Thor-based robot brains | Weight, battery power, real-time inference latency, vibration tolerance |
| Drone perception | NVIDIA Jetson Orin/Thor-based drone compute, custom defense UAV boards | Weight, power, ruggedization; wireless comm integration |
| Industrial edge & IIoT | Industrial AI gateways; edge inference boxes; factory floor compute | Extended temperature range, long product lifetime, industrial bus integration |
The cross-pillar relevance of subsystem boards reaches beyond SemiconductorX: DatacentersX covers the rack-to-cluster scale-out; ElectronsX covers the vehicle and robotics applications where automotive and robotics inference boards live as the compute brain; 5IREnterprise covers the industrial and energy-grid applications. Subsystem boards are the convergence point that connects silicon-level manufacturing to real-world intelligent systems.
Contract Manufacturing
The physical assembly of subsystem boards is largely done by contract manufacturers and ODMs rather than the silicon companies themselves. NVIDIA, AMD, and Intel define the board architecture, own the firmware, and sell under their brand; Foxconn (Hon Hai), Quanta, Wistron, Inventec, Pegatron, and Compal do the build. The high-complexity boards (NVIDIA SXM with advanced packaging integration, custom AI training tiles) often run through captive lines or select premier partners; higher-volume lower-complexity boards distribute across a broader ODM base. Taiwan dominates the contract manufacturing layer, with substantial overflow capacity in Mexico, Thailand, and Vietnam. This is the same ODM landscape that builds datacenter servers, smartphones, and consumer electronics.
Related Coverage
Parent: Module Integration
Sibling modules: Multi-Chip Modules (MCMs) · Memory Modules
Compute silicon consumers: AI Accelerators · GPUs · HBM
Cross-pillar relevance: Automotive & ADAS Chips · Humanoid Robot Semiconductors · Bottleneck Atlas