Data Center Architecture

⚡

Power Infrastructure

A modern hyperscale data center draws 10–200 MW of electricity — enough to power tens of thousands of homes. Utility power arrives at high voltage (typically 138 kV), is stepped down through on-site transformers, and then flows through a carefully engineered chain: UPS systems (online double-conversion, now predominantly lithium-ion) buffer the 10–15 seconds required for diesel generators to reach full speed. PDUs (Power Distribution Units) deliver metered, dual-feed power to every rack. The entire chain is designed around N+1 or 2N redundancy — no single point of failure. Power efficiency is measured by PUE (Power Usage Effectiveness); world-class facilities hit 1.1–1.2, meaning only 10–20% of power is lost to overhead. For investors, the acute grid interconnection backlog (3–5 years in most US markets) is driving demand for behind-the-meter generation: onsite solar, battery storage, hydrogen fuel cells, and microgrids are all active investment themes.

UPS Battery Bank

Online double-conversion · Li-ion · <2ms transfer

Diesel Backup Generator

3–5 MW · starts <15 sec · 72hr fuel tank

PDU & Power Delivery

Per-outlet metering · dual A+B feeds

Grid Voltage138 kV → 480V stepped down

UPS TypeOnline double-conversion, Li-ion

UPS Bridging10–15 sec until generators start

Generator3–5 MW diesel, <15 sec start, 72hr fuel

PDUPer-outlet metering, dual A+B path

Efficiency~96% (transformer) × 96% (UPS) × 98% (PDU)

💡 Grid interconnection queues are 3–5 years in most US markets — behind-the-meter solar + BESS is the key investment unlock.

❄️

Cooling Systems

Cooling is the defining infrastructure challenge of the AI era. Traditional air cooling tops out at roughly 15–25 kW per rack. NVIDIA's H100 DGX systems already draw ~82 kW, and GB200 NVL72 racks hit 120 kW — four to eight times what air can handle. The industry response is liquid cooling: CDUs (Coolant Distribution Units) circulate water or dielectric fluid through cold plates attached directly to GPUs, extracting heat at the source. At the facility level, cooling towers and chillers reject heat to the atmosphere, consuming 1–3 million gallons of water per day per 100 MW of compute. European regulations increasingly require waste heat reuse — piping excess heat to district heating networks. For investors, liquid cooling infrastructure (cold plates, CDUs, immersion tanks, manifolds) represents the highest-urgency capex category, with retrofit demand from existing air-cooled facilities adding urgency beyond new builds.

Evaporative Cooling Tower

Final heat rejection · 1–3M gal/day at 100MW

Direct Liquid Cooling — CDU

GPU cold plates · handles 100kW+ racks

Hot/Cold Aisle Containment

18°C supply · 45°C return · raises efficiency 20%

Air Cooling Limit~25 kW/rack — exceeded by any AI GPU rack

H100 Rack Power82 kW/rack · GB200 NVL72 hits 120 kW

CDU Coolant20°C supply / 44°C return, water or dielectric

Chiller COP5.2 (5.2W cooling per 1W electricity)

Cooling Tower1,200 gal/min · evaporative · final heat sink

Water Use1–3M gallons/day per 100 MW facility

💡 Liquid/immersion cooling is the single most urgent infrastructure shift — air cooling is physically impossible for AI GPU racks.

🔗

Network Fabric

Data center networking operates at two distinct layers. The front-end Ethernet fabric connects servers to the outside world using a spine-leaf topology: a small number of high-radix spine switches (64 ports × 400G = 25.6 Tb/s per switch) interconnect a larger tier of leaf (ToR) switches. This design provides predictable, low-latency paths and easy horizontal scaling. The back-end GPU fabric is entirely different — AI training requires all-to-all collective communication (AllReduce, AllGather) at extreme bandwidth. InfiniBand HDR/NDR (400–800 Gb/s, 600 ns latency, zero-copy RDMA) dominates today, while RoCE (RDMA over Converged Ethernet) provides a lower-cost alternative. The emerging frontier is silicon photonics and co-packaged optics (CPO): replacing copper links with optical interconnects directly integrated into switch ASICs, potentially reducing networking power by 5–10×. Lightmatter, Ayar Labs, and Intel Silicon Photonics are key companies to watch.

Spine-Leaf Clos Topology

BGP/ECMP · zero blocking · scales to 100k+ servers

InfiniBand GPU Fabric

400Gb/s · 600ns latency · RDMA zero-copy

Top-of-Rack Switch

48×25G downlinks · 2×100G uplinks · 1 per rack

TopologySpine-leaf Clos · non-blocking · BGP/ECMP

Spine Switch64-port 400G · 25.6 Tb/s per switch

GPU InterconnectInfiniBand 400Gb/s · RDMA · 600ns latency

NVLink (intra-server)900 GB/s total bandwidth across 8× H100

Next WaveSilicon photonics / CPO — 5–10× power reduction

ProtocolRoCEv2 or InfiniBand for GPU-to-GPU RDMA

💡 Silicon photonics (Lightmatter, Ayar Labs) is the deep tech bet — co-packaged optics cut I/O power 5–10× vs copper.

🖥️

Compute Hardware

Modern AI workloads have bifurcated the compute market. Training demands massive, tightly-coupled clusters: NVIDIA's DGX H100 (8 GPUs, 640 GB HBM3, NVLink 900 GB/s, 10,200W) is today's gold standard, while the GB200 NVL72 (72 GPUs per rack, 120 kW) defines the next generation. Inference at scale favors efficient, purpose-built hardware: custom ASICs from Google (TPU), Meta (MTIA), and Microsoft (Maia) deliver far better performance-per-watt than general-purpose GPUs for specific model architectures. CPU servers remain essential for general-purpose compute, orchestration, and data preprocessing — a typical cluster is roughly 1 GPU node per 4 CPU nodes. The capital investment is staggering: a single DGX H100 costs ~$400K; a 1,000-GPU cluster approaches $200M in hardware alone, before networking, power, and cooling. GPU utilization optimization (RunAI, CoreWeave's scheduling software) can reduce effective compute cost by 40–60% by moving typical utilization from 30% to 70%+ — a software-defined capital efficiency play with no hardware cost.

NVIDIA DGX H100 — 8× GPU Server

10,200W · NVLink 4.0 · requires liquid cooling

CPU Server Rows

Dual-socket · 1U/2U · 80+ Titanium PSUs

Custom AI ASIC Tray

TPU / Trainium / Maia — 3–10× perf-per-watt vs GPU

H100 DGX8 GPUs · 10,200W · ~$400K/unit · NVLink 900GB/s

GB200 NVL7272 GPUs · 120 kW/rack · requires immersion or DLC

Custom ASICsGoogle TPU v5, AWS Trainium2, Microsoft Maia

GPU UtilizationIndustry avg ~30% — RunAI/Volcano push to 70%+

DPU / SmartNICOffloads networking/security from CPU — NVIDIA BlueField

Form Factor1U/2U CPU · 4U–8U GPU trays · OCP designs

💡 GPU utilization optimization (RunAI acquired ~$700M by NVIDIA) — improving 30%→70% halves effective compute cost.

💾

Storage Systems

AI storage requirements are defined by two competing demands: capacity (storing petabytes of training data, model checkpoints, and inference logs) and throughput (feeding GPU clusters fast enough that compute is never starved). Traditional SANs and NAS are insufficient — AI training at scale requires parallel file systems (Lustre, GPFS, BeeGFS, Weka) that stripe data across hundreds of NVMe drives and deliver aggregate throughput of hundreds of GB/s. For datasets and checkpoints, dense object storage (S3-compatible) at commodity cost sits behind the high-performance tier. The architecture is typically three-tier: ultra-fast NVMe cache for active training data → parallel file system for warm data → object store for cold data and long-term retention. VAST Data ($9B valuation) validated the market for unified high-performance AI storage. NVMe drives have replaced HDDs for performance tiers: a single NVMe delivers 7 GB/s vs. 200 MB/s for spinning disk, at declining cost per GB as 3D NAND scales.

HDD vs NVMe Side-by-Side

HDD: $0.015/GB · NVMe: $0.10/GB · 350× faster

High-Density Storage Array

60× 20TB HDDs in 4U · 1.2 PB per shelf

Parallel File System (WekaIO/VAST)

Feeds GPU clusters at 5+ TB/s aggregate

NVMe SSD7 GB/s · 10–30µs · $0.10/GB — GPU scratch

HDD (NL-SAS)300 MB/s · 5ms · $0.015/GB — bulk storage

Object Store$0.005/GB — model checkpoints, datasets

Parallel FSWekaIO / VAST Data — 5+ TB/s to GPU clusters

VAST Data$9B valuation — validated AI storage market

CXL Memory PoolingNext wave — disaggregated shared DRAM across servers

💡 VAST Data ($9B) proved hyperscale AI storage is a real market. CXL memory pooling is the next architectural shift.

🏗️

Floor Layout

The physical floor plan of a data center is an engineered airflow system. The dominant pattern is the hot aisle / cold aisle arrangement: racks face each other front-to-front (cold aisle, where cool air enters) and back-to-back (hot aisle, where exhaust exits). Cold air is supplied through perforated tiles in a raised floor plenum and hot exhaust is collected at ceiling level and returned to CRAC/CRAH units. At high GPU densities (>25 kW/rack) this approach fails — hot spots form, and air mixing degrades efficiency. Alternatives include in-row cooling (CRACs between rack rows), rear-door heat exchangers, and for the highest densities, direct liquid cooling that eliminates the air loop entirely. Modular designs — prefabricated data center modules (PDCMs) deployed as self-contained units — allow capacity to be added in 2–4 MW increments without full facility builds, dramatically reducing time-to-capacity from 24–36 months to 6–9 months. Site selection increasingly factors in proximity to renewable power, water availability (for cooling towers), seismic risk, and geopolitical stability.

Hot/Cold Aisle — Top View

3 rows · 2 aisles · overhead cable tray

Server Racks — Cold Aisle View

18°C intake · perforated floor tiles · containment curtains

Modular DC Container

250–500kW per module · 6-week deploy time

LayoutHot/cold aisle alternating rows — standard design

Floor TypeRaised floor 18" — perforated tiles supply cold air

Cold Aisle Temp18°C supply air from underfloor plenum

Hot Aisle Temp45°C return air → CRAC units → chilled again

ContainmentCurtains/doors seal aisles — raises PUE by 0.1–0.2

Cable ManagementOverhead fiber trays + power busway separate pathways

💡 Modular containerized DCs (Vertiv, Schneider) cut deployment from 18 months to 6 weeks — critical for AI buildout speed.

📊

Operations & DCIM

A 100 MW data center is a real-time physical system with tens of thousands of sensors generating continuous streams of temperature, power, humidity, and vibration data. DCIM (Data Center Infrastructure Management) software aggregates this telemetry into a unified operational picture — PUE dashboards, capacity planning heat maps, predictive maintenance alerts, and automated remediation workflows. The next frontier is AI-driven operations: DeepMind's WaveNet reduced Google's cooling energy by 30% using reinforcement learning; similar approaches from startups (Vigilant, nOps, Arcadia) target power optimization, workload placement, and failure prediction. Carbon-aware scheduling — shifting compute to times and locations with lower grid carbon intensity — can reduce Scope 2 emissions by 20–30% with no hardware change, using software alone. This is perhaps the highest-leverage, lowest-cost decarbonization lever in the data center stack and represents a significant investment opportunity where pure-software margins meet a regulatory tailwind.

DCIM Dashboard — Live Monitoring

10,000+ sensors · 1-sec polling · carbon-aware scheduling

Temperature Heatmap

Hot/cold aisle visible · predictive maintenance alerts

Power & UPS Monitoring

Per-outlet metering · battery health · ATS status

DCIM PlatformSchneider EcoStruxure / Vertiv Environet

Sensor Count10,000–50,000+ sensors · 1-second poll frequency

Carbon-AwareShift batch AI jobs to low-carbon grid windows

Scope 2 Savings20–30% Scope 2 reduction with scheduling software alone

Predictive Maint.ML on sensor streams → catch failures before they happen

Uptime TargetTier IV: 99.9999% = <26 min downtime/year

💡 Carbon-aware compute is Vectors' highest-fit opportunity — 20–30% Scope 2 reduction, zero hardware investment, direct fit with corporate decarbonization mandates.