SemiconductorX > Chip Types > Compute & Logic > GPUs


GPUs



Graphics Processing Units are massively parallel processors with thousands of compute cores, high-bandwidth memory stacks, and specialized matrix engines for mixed-precision arithmetic. Originally designed for graphics rendering, GPUs became the default AI training accelerator because of their architectural match to the linear algebra operations that define neural network training — and because NVIDIA built a software ecosystem (CUDA, cuDNN, NCCL, TensorRT) over fifteen years that made GPUs the path of least resistance for AI researchers and engineers.

The AI training GPU market is the most supply-constrained product category in semiconductors. NVIDIA H100, H200, and B200 face a stacked bottleneck structure — TSMC N4/N3 wafer allocation for the GPU die, CoWoS advanced packaging capacity at TSMC to integrate the die with HBM stacks, and SK Hynix HBM3E supply for the memory — where all three constraints operate on different physical resources and different expansion timelines. Relieving one does not relieve the others. This structure has made GPU supply the gating constraint on AI infrastructure buildout globally since 2023.

GPU Families — Products & Process

Family / platform Flagship products Process node Supplier
NVIDIA Hopper (datacenter) H100 SXM5 (80GB HBM3, 3.35 TB/s); H100 PCIe; H200 SXM5 (141GB HBM3E, 4.8 TB/s); H100 NVL (dual-GPU NVLink capsule) TSMC N4 (GH100 die, 80B transistors); CoWoS-S packaging; HBM3/HBM3E memory stacks NVIDIA (fabless); TSMC foundry; SK Hynix HBM (primary); Samsung HBM (qualifying)
NVIDIA Blackwell (datacenter) B100 / B200 (dual-die GB200, 192GB HBM3E, ~4.7 TB/s per GPU); GB200 NVL72 (72 GPUs + 36 Grace CPUs in NVLink domain); B200A (single die, air-cooled) TSMC N4P (dual GB202 dies on single package via NVLink-C2C); CoWoS-L packaging; HBM3E (8 stacks × 24GB = 192GB) NVIDIA (fabless); TSMC foundry; SK Hynix HBM3E (primary); Samsung qualifying; CoWoS-L at TSMC advanced packaging
NVIDIA Rubin (datacenter, roadmap) R100 (Rubin GPU); NVLink generation 6; HBM4 memory; projected 2026+ volume TSMC N3 or beyond (projected); HBM4 with logic base die (TSMC logic node); CoWoS next-generation NVIDIA (fabless); TSMC; SK Hynix HBM4 (development)
NVIDIA GeForce RTX (consumer / inference) RTX 5090 (GB202, 32GB GDDR7, 1.79 TB/s); RTX 5080 (16GB GDDR7); RTX 4090 (Ada Lovelace, 24GB GDDR6X); RTX 4070 Ti Super TSMC N4P (RTX 5090/5080 Blackwell); TSMC N4 (RTX 4090 Ada); GDDR7 (Micron, Samsung) not HBM — different memory supply chain from datacenter GPU NVIDIA (fabless); TSMC foundry; Micron GDDR6X/GDDR7; Samsung GDDR7
AMD Instinct (datacenter) MI300X (192GB HBM3, 5.3 TB/s, monolithic CPU+GPU chiplet); MI325X (HBM3E variant); MI350 (CDNA 4, roadmap); MI400 series (CDNA 5, roadmap) TSMC N5 (MI300X compute dies); TSMC N6 (I/O die); chiplet — 3 GPU dies + 1 CPU die + HBM3 stacks on organic substrate AMD (fabless); TSMC foundry; Samsung HBM3 (primary for MI300X); SK Hynix qualifying
AMD Radeon RX (consumer) Radeon RX 9070 XT (RDNA 4, 16GB GDDR6); RX 7900 XTX (RDNA 3, 24GB GDDR6); RX 7600 (mainstream) TSMC N4 (RDNA 4 Navi 48); TSMC N5 (RDNA 3 Navi 31 compute die); GDDR6 memory — mature supply vs HBM AMD (fabless); TSMC foundry; Samsung/SK Hynix GDDR6
Intel Arc / Battlemage (consumer) Arc B580 (Battlemage, 12GB GDDR6, TSMC N5); Arc A770 (Alchemist, 16GB GDDR6); Flex series (datacenter inference, limited deployment) TSMC N5 (Battlemage BMG-G21); TSMC N6 (Alchemist ACM); GDDR6 memory Intel (fabless for Arc, using TSMC); limited datacenter GPU traction; no meaningful AI training share

Deployment & Supply Chain Risk

Platform Focus sector deployment Primary supply chain risk
NVIDIA H100 / H200 AI training clusters (hyperscaler and enterprise); LLM training; inference cloud (H200 inference-optimized); HPC simulation Stacked bottleneck: TSMC N4 + CoWoS-S + SK Hynix HBM3/HBM3E simultaneously constrained; multi-year wait lists at peak demand
NVIDIA B200 / GB200 NVL72 Next-generation AI training superclusters; inference-at-scale for frontier models; robotics sim-to-real compute CoWoS-L packaging for dual-die B200 is new capacity ramp; HBM3E 8-stack configuration extends SK Hynix concentration; GB200 NVL72 rack integration complexity adds ODM/system supply chain risk
AMD MI300X / MI325X AI training and inference (Microsoft Azure ND MI300X, Oracle, Meta); LLM inference (large memory capacity advantage over H100) TSMC N5 shared with AMD EPYC; chiplet substrate yield; ROCm software ecosystem maturity gap vs CUDA limits customer adoption ceiling
Consumer GPU (RTX 5090, RX 9070) Edge inference (local LLM, Stable Diffusion); gaming; content creation; ML research on workstations TSMC N4P shared with datacenter GPU; GDDR7 supply ramp (Micron, Samsung); consumer allocation lower priority than datacenter at NVIDIA

The Stacked Bottleneck

The NVIDIA H100/B200 supply chain is the defining bottleneck in AI infrastructure. Three independent constraints apply simultaneously to every GPU that ships:

TSMC N4/N3 wafer starts. The GPU compute die is fabricated at TSMC N4 (Hopper) or N4P (Blackwell). This wafer allocation competes with AMD EPYC, Apple M-series, Qualcomm, and every other fabless company at the same node. TSMC prioritizes its largest customers, and NVIDIA's datacenter GPU volumes make it a top-tier allocation priority — but the pool is still finite and shared.

CoWoS packaging at TSMC. After wafer fabrication, the GPU die must be integrated with HBM stacks on a silicon interposer using TSMC's CoWoS process. CoWoS capacity is physically separate from wafer fab capacity and was the binding shipment constraint on H100 supply during 2023–2024. TSMC has been expanding CoWoS aggressively, but the lead time means this remains a structural constraint through the Blackwell generation.

SK Hynix HBM3E supply. SK Hynix supplies the dominant share of HBM3E for B200. Samsung is qualifying as a second source. Micron is ramping as a third. The three-supplier qualification process is multi-year, and the B200's 8-stack × 24GB = 192GB HBM3E configuration per GPU consumes significantly more HBM wafer area per unit than the H100's 6-stack configuration.

CUDA Ecosystem Lock-In

NVIDIA's competitive moat is not the GPU die — it is the CUDA software ecosystem built over fifteen years. PyTorch, TensorFlow, and JAX are all CUDA-native. cuDNN, cuBLAS, NCCL, and TensorRT are NVIDIA-proprietary libraries with no fully drop-in equivalents. The majority of AI research code is written assuming CUDA availability. AMD's ROCm ecosystem has made significant progress and supports major frameworks, but the friction of porting CUDA kernels, the performance gap on custom kernel operations, and the smaller community of ROCm-experienced engineers creates a switching cost that is not primarily financial — it is time and engineering capacity. This software lock-in is a supply chain risk because it concentrates AI infrastructure demand on a single GPU architecture in a way that cannot be diversified simply by qualifying a second hardware supplier.

Export Controls & Supply Chain Bifurcation

US export controls on advanced AI GPUs — implemented in successive rounds from October 2022 through 2024 — have created a bifurcated GPU supply chain. NVIDIA H100, H200, and B200 are restricted for export to China and a broad set of countries without specific licenses. NVIDIA has developed compliant variants (H20, L20, L2) with reduced compute performance that meet export control thresholds — these are the only NVIDIA datacenter GPUs legally available in China. Huawei Ascend 910B and 910C have emerged as the primary domestic Chinese alternative, fabricated at SMIC on a 7nm-class process. The bifurcation creates two parallel GPU supply chains: the TSMC/NVIDIA/SK Hynix chain serving Western AI infrastructure, and the SMIC/Huawei chain serving Chinese AI infrastructure — with different process nodes, different software ecosystems, and different memory architectures.

Related Coverage

Compute & Logic Hub | CPUs | AI Accelerators | HBM Supply Chain | AI Inference & Edge Compute SoCs | CoWoS Advanced Packaging | Semiconductor Bottleneck Atlas | NVIDIA Spotlight

Cross-Network — ElectronsX Demand Side

Every AI model deployed in an EV, AV, or humanoid robot was trained on GPU clusters. The training infrastructure — H100/B200 clusters consuming megawatts of power — is the upstream demand signal for the datacenter power, cooling, and grid interconnection buildout covered across ElectronsX. Robotics sim-to-real pipelines generating synthetic training data for robot manipulation and navigation are an emerging GPU demand category tied directly to humanoid robot deployment scale.

EX: ADAS/AV Compute Architecture | EX: Humanoid Robots | EX: Supply Chain Convergence Map