Multi-Chip Modules (MCMs)
Multi-Chip Modules (MCMs) integrate two or more packaged dies onto a common substrate, creating a single functional module with higher performance and bandwidth than discrete chips on a printed circuit board (PCB). MCMs are widely used in CPUs, GPUs, AI accelerators, and networking ASICs, where die-to-die communication and integration density are critical. Unlike 3D ICs or SiPs, MCMs typically use 2.5D or organic substrates to interconnect multiple dies side-by-side.
Process Overview
- Step 1: Known-good packaged dies (logic, memory, accelerators) are sourced from wafer-level assembly.
- Step 2: Dies are mounted on a common organic or ceramic substrate with fine-pitch routing.
- Step 3: High-speed interconnects (e.g., Infinity Fabric, NVLink-C2C, UCIe) are used for die-to-die links.
- Step 4: Thermal management (heat spreaders, vapor chambers, liquid cooling) is integrated to handle module-level power density.
- Step 5: Modules undergo final testing before being delivered to board/system integrators.
Key Features
- Performance Scaling: Achieves system-level performance beyond single-die reticle limits.
- Heterogeneous Integration: Allows CPU, GPU, and memory dies to be mixed in one module.
- Flexibility: Enables mixing dies from different process nodes and foundries.
- Yield Benefits: Improves yield vs monolithic mega-dies by assembling known-good dies (KGDs).
Representative Examples
Module | Company | Composition | Applications |
---|---|---|---|
Grace Superchip | NVIDIA | 2 Grace CPUs on one module, connected via NVLink-C2C | AI & HPC servers |
EPYC CPUs | AMD | Multiple CCD chiplets + I/O die | Datacenter and enterprise CPUs |
Xeon Sapphire Rapids | Intel | CPU tiles + HBM on an organic substrate | Datacenter and HPC |
MI300 | AMD | CPU + GPU + HBM in a stacked + side-by-side MCM | Exascale AI/HPC |
Key Considerations
- Bandwidth: High-speed fabrics like NVLink, Infinity Fabric, or UCIe are critical for efficient scaling.
- Power Delivery: Requires advanced VRM designs and multi-layer organic substrates to handle high currents.
- Thermal Design: Large modules often need heat spreaders, vapor chambers, or liquid cooling.
- Reliability: Warpage and CTE (coefficient of thermal expansion) mismatches between dies and substrates are major challenges.
Market Outlook
MCMs are becoming the dominant integration strategy for AI and HPC as reticle limits constrain monolithic scaling. AMD pioneered this approach with EPYC CPUs, and NVIDIA, Intel, and Tesla are all advancing multi-die modules for GPUs, CPUs, and AI accelerators. By 2030, MCMs and chiplet-based architectures will account for the majority of leading-edge compute devices, supported by industry standards such as UCIe for die-to-die interoperability.