The DDR4 challenge: save the prototype!
One day we received an SOS call from Erik*, who works at a company that develops communication systems.
Erik’s team was testing their latest prototype hardware, including an NXP QorIQ processor and DDR4 memory. They experienced some timing issues and were not able to get their hardware up and running.
We investigated the board and found DDR4 timing issues using a SI analysis. We made a BSP with work-arounds. The result? Erik reacts relieved: “In the end, we had the board up and running within a day.”
Designing in DDR4 for the first time
Erik’s teams built their first design using DDR4 memory. They were careful to keep the DDR4 SDRAM implementation details close to reference designs. To avoid potential issues, they closely followed NXP application notes.
PCB design was done with the help of external PCB consultants, who did everything they could to keep crosstalk under control using the Altium layout tools. To compensate for missing simulation tools, they chose a 12 layer PCB stack-up and paid close attention to line width, material parameters, line impedance, and termination resistors.
As Erik tells us: “When we called, we had been trying to get our hardware up and running, without success. We felt a little blindfolded, and didn’t know how to go proceed. We were seeking the help of someone who had done it before, and found Sintecs through the NXP website.”
Before the call, Erik’s team had already done quite an extensive investigation. They had identified some design flaws. They had implemented work-arounds that allowed them to boot their boards and run a few memory tests. The main issue at the time of calling was a failing clocks centering / adjustment test.
“During the initial e-mails and short telco, we got a good impression, and that everyone at Sintecs knew the challenges we were facing. We decided to ask them for help with troubleshooting.”
Finding the root cause
After signing the NDA we first and foremost focused on the signal integrity analysis for the DDR4. Based on signal integrity analysis and simulation results, we essentially concluded the same as Erik’s team had found in the prototype: the address bus and the clock would most likely not work.
The nice thing about signal integrity analysis is that it does not just confirm that the problem is likely to occur. It also tells you where it comes from, based on simulations.
This eye diagram for a typical case at a speed of 800 Mb/s show us a ringback crossing the threshold. That causes a failure in the address signal: it reduces the timing window and makes it impossible to access the DDR4 memory at the desired 800 Mb/s.
What causes the ringback? The simulations show that the use of conventional vias gives rise to ringback. All vias in a PCB cause an abrupt impedance change, and that results in the reflection of signals. When designing a PCB with DDR3 and DDR4 memory, multiple memory chips in a fly-by topology are common practice. This causes the signal to travel through multiple vias. That amplifies the reflections and causes for instance the shown ringback effect. In that case, you need to be careful how you choose the topology.
You can simulate the reflections on all vias (be it conventional or micro vias) using signal analysis tooling (for instance Mentor Graphics tooling). This shows you how the signal artefacts influence the surroundings of the via. The solution lies often the right mix of via types and topology (in other words, which kinds of vias do you choose? where do you place the vias and components on the PCB?). Unfortunately, it is not as simple as stating “just choose this or that type of via and you’ll never see ringback again” – sometimes the one choice is better because there is lack of room, the next time the other choice is better because that makes it easier to route via multiple layers, etc, etc. In this particular case the conventional vias are the root cause for all issues and simulations show that microvias and buried vias can mitigate the consequences of the reflections.
What we often notice when a design needs troubleshooting, is that the engineer has continued to use a way of working that always has been good practice – but no longer is. He has continued to use the technologies which have always served him well – such as in this case the conventional vias. However, since the edges are getting steeper and since everything on the board gets smaller, a technology update – such as in this the first time use of DDR4 – may cause good old techniques or ways of working to unexpectedly fail. Unfortunately, that comes to light only after the prototype has been made. Then the engineer finds out the hard way that they should have analyzed and simulated way more at an earlier stage of the project.
What about next time? A prototype without any troubles, please….
Simulations showed that use of conventional vias causes the ringback. We advise Erik’s team to replace those with micro vias and buried vias in a next design round, to minimize reflections from the vias. A redesign of the board is the only way of completely getting rid of the issues. Based on superior simulated performance, we advise a different PCB stack-up for the next prototype
Using the prototype against the odds
Did we give up on the prototype after concluding this? Far from it. Our software engineers made a board support package (BSP) that allowed full use of the board, with the concession of accessing the DDR4 memory at the lowest speed. At a lower speed, the negative effects on the reflections hardly change. What does change is that the period between two “clock edges” gets longer: the time between the ringback crossing the threshold (which the system interprets as a clock signal) and the start of the next clock signal is longer. That leaves enough time to read the memory between the two “clock edges”.
Though the board does not perform as planned, it can be used for general hardware and software test and verification. Erik clarifies: “Not only did Sintecs solve the DDR4 timing issues, they also helped with the Board Support Package, to quickly get the board up and running when we had issues with internal resource capacity.”
Erik’s team had their first application up and running within a day after we delivered the BSP. “The small incremental deliveries helped us to develop the prototype step by step and helped us to meet a deadline that seemed totally impossible when we first called.”
Erik had already accepted the need for a redesign based on their initial investigations and the issues they found there. Our analysis helped to identify more issues than they could have identified themselves in the existing prototype. Although another prototype is a though bullet to bite, Erik is happy knowing that the next prototype will fix the issues.
“Sintecs gave the prototype boards more value than we expected when we called. Ultimately we could get the boards up and running, and were able to test all the main parts of our prototype design. For that reason we’ll give them the assignment for the next prototype, earlier in the design. Sintecs will make the next PCB design with signal integrity and timing analysis to avoid issues before they arise.”
*For reasons of discretion, we do not use our customer’s real name.
What can you do if your prototype doesn’t work?
We have a standard approach for finding issues in a prototype. You may try this yourself:
Check the schematics and the PCB layout.
Carry out a signal integrity analysis and timing analysis on common nuisances such as the DDR4 interface or other high-speed interfaces.
Carry out a power integrity analysis on the power rails of for instance the memory.
Review the FPGA code, low-level software and/or choice of setting.
Our tip: always ask the help of someone who still has a fresh view on your design.
Questions on DDR4?
Are you looking for feedback on a design?
Do you have troubles with a prototype?
Leave us a message to start the conversation.