Abstract: AI applications on GPU systems have exploded with an almost unlimited appetite for compute. Single-chip inference performance has increased 1000X over the last 10 years with improvements ranging from architecture to circuits. Tens of thousands of datacenter connected GPUs are needed for training and inference of state-of-the-art generative AI models. Bandwidth density requirements increase on the order of 2x in each generation and power delivery is strained at all levels from on-die to datacenter. This creates a critical need for future research addressing scaling and power reduction in all areas including compute/memory, communication, power delivery, and heat removal. In this talk, we will explore what is ultimately needed for design and deployment of these systems at the circuit and hardware level including circuits for compute, electrical and photonic interconnect, memory system design and scaling, packaging and stacking, power delivery and conversion at the rack, board, and die level, and thermal design and limitations.
C. Thomas Gray received the B.S. degree in computer science and mathematics from Mississippi College, Clinton, MS, USA, and the M.S. and Ph.D. degrees in computer engineering from North Carolina State University, Raleigh, NC, USA. From 1993 to 1998, he was an Advisory Engineer with IBM, Research Triangle Park, NC, USA, working on transceiver design for communication systems. From 1998 to 2004, he was a Senior Staff Design Engineer with the Analog/Mixed Signal Design Group, Cadence Design Systems, working on SerDes system architecture. From 2004 to 2010, he was a Consultant Design Engineer with Artisan/ARM and Technical Lead of SerDes architecture and design. In 2010, he joined Nethra Imaging as a System Architect. His work experience includes digital signal processing design and CMOS implementation of DSP blocks as well as high-speed serial link communication systems, architectures, and implementation. In 2011, he joined NVIDIA, Inc., Durham, NC, USA, where he is currently Senior Director of Circuit Research, leading activities related to high-speed electrical signaling, photonics, power delivery, security circuits, low-energy and resilient memories, circuits for machine learning, and variation-tolerant clocking and power delivery.