Whether used in a cloud data centre or on premises, a new breed of processors is able to increase throughput and reduce latency. However, processor advances are pushing power delivery boundaries, meaning power often limits the ability to reap top processor performance.
Data centres in demand
The recent pandemic drove a surge in online shopping, media streaming and home working, with significant hyperscale service and retail providers expanding capacity.
This data centre growth should be set against a backdrop of several other drivers. The major technology-driven trends of the last decade include the IoT, artificial intelligence (AI), machine learning at the edge and the exponential growth of operational technology (OT) workloads. Industrial operational performance improvement initiatives such as Industry 4.0 caused a dramatic rise of OT deployments. These factors necessitated more compute capacity, but they have also led to more diverse and demanding workloads.
According to Mordor Intelligence, a market research company, the hyperscale data centre market will grow at a CAGR of 4.85% during 2021-2026.
There is also demand for data centres to offer a flexible and scalable compute infrastructure capable of supporting highly dynamic workloads. These compute tasks involve low latency, spiking neural network algorithms and search acceleration. Specialised and optimised processing devices such as FPGAs, GPUs and neural processing units (NPUs), once rarely used in a data centre, have now become commonplace. There is also a new breed of asics, such as clustered AI neural network inference engines, in demand for high-performance computing tasks.
The advances of processor technologies enable high-performance computing to stretch the boundaries of task throughput, offering the agility to accommodate more workload diversity. Technology gains however, often depend on other aspects of a system to advance together.
Thermal challenges
In the semiconductor industry, change is inevitable. No sooner has a new, smaller silicon process node come into production than the next iteration is not far behind.
Smaller geometries permit fabricating more individual semiconductor gates in a given space. Although 65nm and 55nm process nodes are still routinely used for many ICs, high-performance computing devices such as asics, FPGAs, GPUs and NPUs are typically based on process nodes of 12nm or less, with 7nm and 5nm becoming increasingly popular and 3nm process node devices are imminent.
Increasing the density of individual gates by reducing their dimensions highlights the constraints of managing the thermal characteristics of processors. Reducing gate working voltage, or voltage scaling, helps reduce the heat dissipation of each transistor, but thermal management of the complete package remains paramount.
Typically, a high-performance processor will run at its maximum clock rate until thermal limits require it to be reined back. Voltage scaling has seen core voltages drop to 0.75V for the most sophisticated 5nm process node-based devices and will fall further to a predicted 0.23V for 3nm process node devices. To further complicate the power delivery challenge, many devices require multiple rails of different voltage levels, sequenced carefully to avoid permanent damage.
With the hundreds of billions of transistors typically found in a leading-edge GPU, the current demands become immense, running into many hundreds of amps. A 1,000A requirement is not uncommon for a clustered AI processor. The current trend is that a processor’s power consumption doubles every two years.
Another aspect of delivering power to such power-hungry devices is that their workloads can vary in a microsecond, potentially producing huge transients across the power delivery network (PDN).
Power delivery challenges
Power delivery and power efficiency have become the largest concerns in large-scale computing systems. The industry has witnessed a dramatic increase in power consumed by processors with the advent of asics and GPUs processing complex AI functions. Rack power has also subsequently increased with AI capability being utilised in large-scale learning and inferencing application deployments. In most cases, power delivery is now the limiting factor in computing performance as new CPUs look to consume ever increasing currents. Power delivery entails not just the distribution of power but also the efficiency, size, cost and thermal performance.
Advances in semiconductor process technologies introduce several challenging conditions for the PDN, but not all of them are technical. For example, the physical size of these processing devices occupies a considerable proportion of available board space. A complication is that the board space is typically limited to an industry-standard form factor.
To further exacerbate size constraints, the nature of high-performance compute devices requires supporting ICs, such as memory and optical transceivers placed close to the processor. This approach also applies to the point of load (PoL) power regulators due to the dramatic increases in current consumption and the reduced core voltages. The impact of PCB trace resistance with high currents creates I2R losses with a discernable voltage drop sufficient to affect processor performance, or worse, result in erratic behaviour. PoL regulators also need to be highly power-efficient to prevent further thermal management complications.
The combination of space-constrained boards and the need to mount regulators close to the processor stipulate a new and innovative approach to architecting the network PDN.
The limits of PDN
Architecting an efficient PDN presents a set of three significant and inter-related challenges for power systems engineers.
The first is increasing current density. High-performance processors can consume hundreds of amps. Getting sufficient power capability to the processor not only involves the physical constraints of where to place PoL converters, but complex decisions in PCB track routing power to the converter from edge connectors. High voltage transients resulting from extremely dynamic workloads can interfere with other system components.
The second challenge is to improve power efficiency. There are two influencing factors: I2R losses and conversion efficiency. PCB tracks are ideal for routing low-voltage signals and digital logic, but for high currents, no matter how short, they can represent significant resistive losses. These I2R losses lower the voltage supplied to the processor and can cause localised heating. With hundreds of other components on a processor card, there is a limit to the size of power supply tracks so placing the converter as close as possible to the processor is the only viable alternative.
The converter’s power efficiency is an attribute of its design. The development of high-efficiency PoL converters involves an iterative approach to understanding losses in every component, from passives to semiconductors. These losses manifest as heat, which requires dissipation. PoL converter module designers need to optimise the module’s internal design to achieve an isothermal package.
The final challenge is to maintain PDN simplicity. Some power architects opt to create a discrete PoL converter for the processor to customise the PDN. Conversely, this adds complexity. A discrete design increases the bill of materials with the need for sourcing more components and the associated logistics and supply chain costs. This approach also requires more engineering effort, increasing non-recoverable expenses and extends the development and testing timeline.
The alternative is a modular approach. Thermally adept, integrated power modules simplify the power design significantly, reducing the bill of materials, adding flexibility for changes and expediting development. Power modules are compact, power dense and easy to scale up or down.
For legacy systems where more efficiency and power are needed, a bidirectional non-isolated bus converter, such as Vicor’s NBM, enables efficient conversion from 48V to 12V and vice versa, integrating a legacy board into a 48V infrastructure or the latest GPU into a legacy 12V rack.
For 48V to PoL delivery, a module such as Vicor’s Power-on-Package (PoP) can reduce motherboard resistances up to 50x and processing power pin count more than 10x. This package uses factorised power architecture with lateral power delivery and vertical power delivery to eliminate power distribution losses.
The demands of data centres, edge compute and IoT are not subsiding. Big data needs to be processed quicker than ever; today’s maximum processing speeds will be too slow nine months from now. This will bring power delivery into focus again. Finding new ways to increase throughput and reduce latency is a perpetual challenge, but flexible and scalable modular power will complete the puzzle to minimise redesigns and ease future modification.