Blog
Inside Portable Stimulus: Verification Efficiency
If verification were as hot a topic as artificial intelligence (AI), we would be measuring things like effective verification cycles per watt. Unfortunately, the only things that ever seem to be measured in the verification world are the engines on which simulations are performed. We never actually measure the effectiveness of verification itself, probably because there was, for many years, only one methodology in the industry –– constrained random stimulus generation.
Sure, there were competing libraries to help implement this methodology, but most of the time, the jockeying of those was more political than technical. All of them provided enough gain over the then incumbent methodology of directed testing. Everyone was happy.
During the golden years of simulation, vendors would trade position for fastest Verilog simulator or VHDL simulator, or mixed-language simulator –– and then adjust their pricing such that the simulation cycles per dollar were equalized. One vendor might add a support tool that provided added value for a while, but the others would quickly replicate it and push that back down into the commodity category.
The vendors loved constrained random because it provided a gain for the end customer. This enabled them to spend their valuable engineering effort on the creation of a model from which stimulus could be generated, rather than having them tediously write and maintain directed tests. The simulation vendors loved it as well because those same verification engineers could effectively handle 10 times the simulation resource than they could in the past. Everyone was happy. Until…SDVFXZB
As designs got larger and more complex, the effectiveness of constrained random dropped precipitously. Simulator performance stagnated because vendors were not able to make effective use of multi-core processors. Emulators came to the rescue, providing the capacity and speed required. But they had one big problem: When you hooked them up to a constrained random testbench the emulator was brought to a crawl because the methodology required the simulator/emulator to stop every cycle and ask the testbench what it should do next. The testbench was incapable of running on the emulator, and so performance gains were limited by Amdahl’s law.
Getting the most out of the emulator required a new methodology. What about running software on the emulated model of hardware? For some, this was quite an easy sell because it fit into the notion of shift left –– move software development and integration to be pre-silicon rather than post-silicon. The users loved it because it took the risk out of their hardware development and the vendors loved it because they could sell their emulators both to the hardware team and the software team.
Along the way, the industry lost track of verification efficiency. Every now and again, verification effectiveness was questioned. Do we need to be running all of these tests? Does this test add anything to total coverage?
The running of production software does add some risks in that it hides certain behaviors. What happens if the software changes in the future and that reveals a hardware problem? Sure, it can be fixed in the software, but at what cost?
Jumping from block-level constrained random test patterns on a simulation to running production software on the complete hardware leaves a large gap in the middle that is not being effectively verified. This is a gap that that test synthesis utilizing the Accellera Portable Stimulus Standard (PSS) is perfectly suited for.
When a system is first assembled from individual blocks, the engineer can reasonably assume that the blocks are bug-free. Static connectivity checks can also ensure that the blocks have been stitched up correctly. Now the task is to find out if all of the blocks can talk to each other and do that in a manner that has no unintended consequences. Not only can A talk to B, but when A talks to B, does it slow down or interfere with C interacting with D? Software is not good at systematically performing this type of verification.
Once a PSS model exists for a system, it is possible to think about these two tasks. Test synthesis works out how to make those happen. If A is meant to be able to talk to B, then test synthesis can generate a test that makes that happen. It can make tests wherein every block talks to every other block. This is a path walking through the PSS graph. The graph also acts as an explicit coverage model for this task. A good test synthesis engine can optimize these tests such that all paths are covered in the minimum number of tests. Now we actually have a real metric for verification efficiency and effectiveness.
But the testing of each of those on its own does not verify the other aspect. Does A talking to B impact anything else in the system? This is where test scheduling comes into play. Systems do not perform one task at a time; they have the ability to support concurrency, and this is where potential side effects or unintended consequences can lurk. Again, this is a place where the tools can have a huge impact on the quality and efficiency of the tests created. As an example, memory is a shared resource between tasks. How should memory be assigned to the tasks? Perhaps a random start address might seem like a good idea, but in fact, this is perhaps the worst thing that you could do.
Breker is proud to say that we have developed the world’s worst memory allocator! A good memory allocator may do things sequentially, and this is very easy for hardware. The Breker allocator is very greedy about reusing memory, so we get lots of address conflicts. We intentionally pick numbers that are likely to create difficulties, such as misaligned addresses. It is not about trying to constrain it to expected use cases, because we do not know what the software will do, but it is more about picking addresses that are difficult for hardware.
Later in the verification flow, users can replace an allocator with their own algorithms that may provide more typical patterns that their software might create or ones that they know will stress their hardware. For example, they can tell the scheduler to pick a random address, but it needs to be 32-bit–aligned or have a stride of 64 bytes because they have an array processor.
There is another advantage to the utilization of test synthesis at this stage of verification. In a recent blog, Adnan Hamid, founder and CEO of Breker Verification Systems, presented an example of test efficiency for the verification of the memory subsystem of an Arm processor. Figure 1 shows a cache test developed by Carbon Design Systems, a company that Arm acquired in 2015. As you can see, there is a lot of whitespace in this chart, indicating cycles that were not adding to the verification task.

Figure 2 shows what happens when test synthesis tackles the same problem.

Not only does this produce much higher coverage, but it makes use of almost every simulation cycle. Even Figure 1 is a test that was developed to test this capability specifically. Real software would be a lot less efficient than that. Now translate this into how much you are paying for those simulation seats or emulators and you will quickly see why going to software too early is both ineffective and inefficient.
Portable stimulus and test synthesis can have a huge impact on verification efficiency and effectiveness, and users can save a lot of time and money by utilizing this methodology before they resort to production hardware. We do not advocate stopping the early bring-up of software. A test synthesis methodology based on PSS can result in achieving higher quality faster and with significant risk reduction.