The case study unveils some details of developing a powerful computing system
At Sintecs, engineers from different teams, such as PCB layout and Signal / Power Integrity work closely together. We realize an optimal design by constantly monitoring from different expertise angles whether we are on the right track or not. Due to the continuous feedback that our engineers give each other, the design is adjusted where necessary. This method ensures that problems are solved, or even prevented.
The project – dReDBox
dReDBox is a Horizon 2020 EU project with the aim of more efficient use of data center resources, which saves space and energy. It is based on a new hardware concept in which a pool of disaggregated computing power, memory and acceleration resources are used instead of a fixed server configuration.
dReDBox contains hardware building blocks, also referred to as “dBRICK”. Sixteen dBRICKs are stacked on the dTRAY, a connecting board with a high speed electrical-network that supports extreme low latency transactions from one dBRICK memory to another. dTRAY has three outgoing networks: an optical network, PCIe and ethernet.
To make dReDBox easy to use for programmers, the platform supports Virtual Machine (VM) which configures resources on-the-fly to perfectly match user requirements.
Challenges for Sintecs
A disaggregate, high-performance computing platform is a complex ,high-end system. Our engineers have designed it from scratch, with a completely new architecture, a different physical distribution of processors and high speed memory, and high speed board-to-board interconnects. During the design phase of the dReDBox hardware we encountered three challenges.
1. The speed of the DDR4-memory
The dBRICKs rely on a stable DDR4 memory for fast, reliable operation. For a design like this, the timing margins are so tight that the exact physical configuration has a significant impact on the maximum memory speed. Timing analysis showed that it would be fundamentally impossible to use the memory at its maximum speed of 2400 MT/s if we kept to the PCB layout guidelines of the Zynq UltraScale + MPSoC. To get out of this deadlock, we asked Xilinx for more in-depth information. They collaborated and provided us with timing details that allowed us to optimize the PCB layout.
Hardware verification on the first run of PCBs confirmed that we can indeed use the DDR4 memory at its maximum speed, in a broad temperature range, without timing violations nor EMI problems.
2. Thermal analysis
To avoid thermal issues, we performed thermal analysis and verification early in the design phase. We analyzed the individual dBRICKs and the complete dReDBox system, which dissipates an estimated 750 W. We identified hot spots, looked for zones of flow stagnation and identified overstressed components. The results of the thermal analysis served as an input for the mechanical and electrical design, reliability predictions and stress analyzes.
Thermal modelling allowed us to optimize for airflow and temperature distribution, size and number of the fans. It gave insight in the impact of the configuration changes and failures of fans.
3. Keep power dips and interference out
To assure power integrity and absence of noise on the delivered power, we ran a signal and power integrity co-simulation. We determined the optimal decoupling, experimented with PCB stack-up, and examined many aspects of PCB technology such as used materials, dimensions, tracks and vias. We selected a solution which meets the power requirements yet avoids using exotic materials and processes that drive up costs.
To mitigate the influence of fast switching high-speed interfaces on the dBRICKs and to avoid problems in its fifteen power rails, we adapted the layout of the planes and the decoupling capacitors so that they quickly distribute the large currents resulting from fast signal switching.
No budget and no time for a redesign
A PCB design iteration for a high-end system like this is around fifty thousand euros per round. The only way we could finish the dReDBox design in time and within the budget, is by making the design right the first time. And we managed that.
How did we get a complex design like dReDBox up and running on the first round of PCBs? Whenever we made a design choice, we immediately analyzed it and acted on the analysis results, over and over as we were designing. We iterated, analyzed and optimized the hardware schematics. We could say that our way of working pays off.