Nobody puts a verification job onto an emulator without first making sure that the testcase is efficient and will make good use of such a valuable resource. In the last blog, we showed how to go from a Portable Stimulus Model and, using testbench synthesis technology, migrate that test from a transactional universal verification methodology (UVM) environment into one that generated code running on the embedded processors of a design. These processors can be instantiated into the emulator –– wait what? This is where you hear the needle being scratched across the record.
It is significantly more complicated than that, and yet this is the reason why Portable Stimulus (PSS) was created. It was too difficult to migrate a testbench developed for a simulator onto an emulator. The standard contains nothing that says how this should be done. That is left as an exercise to the PSS tool developer. PSS defines a language in which scenarios can be defined and allows testbench synthesis technology to target multiple platforms and abstractions. Those targets have some specific requirements. Let’s take this one step at a time.
Emulators are sized by the number of gate equivalents they can handle. This is similar to saying an FPGA has this much gate capacity. There is a transformation processes that goes from taking your register transfer level (RTL) code and making it suitable for an emulator or FPGA. Each vendor may be more or less efficient with certain types of design. Increasing the size of your emulator is slow and expensive, so you have to be careful about what you use those emulation resources for. If your design has four processors, do you really want to map those into the emulator, even if you have available the RTL code for them?
This is where the concept of hybrid emulation comes in. You need to partition the design into what will use the emulator fabric and what will run in a behavioral model. The processor is often a target for this and would use the same behavioral model that was used with the simulator. Hybrid emulation is a feature that your emulator provider will have, and they should have tools suitable for making that part of the system efficient. There may be cases, however, where this situation is reversed. What if you are developing your own RISC-V processor and you want to verify it? Now, you may want to make a virtual model of one of more environments and have the RTL code of your processor running within those environments.
Most systems are not self-contained and require some form of external stimulus and results have to be checked. The communications between the emulator and the host machine could be enabled by something like Accellera’s SCE-MI standard that defines a communications protocol between external software or testbench and the emulator. Not all vendors use this standard and some have variants of it. They all work on the same concept. First you take the verification intellectual property (VIP) model and split it into two pieces. One side that connects to the emulator converts transactions into logical signal transitions. The other side deals with higher-level protocols. In between is a communications protocol that optimizes when and how traffic flows between the two pieces. Again, your emulation provider should have a library of these transactor models available or can tell you how to create them.
Getting the communications wrong can kill your emulator performance. Consider an emulator that is running at 10MHz (a typical speed for an FPGA based emulator without excessive tracing or debug enabled) connected to a behavioral model. Each time those two model have to exchange data, the emulator has to stop and will encounter the latency of your physical communication system, potentially PCIe, plus the software layers and then the simulator or testbench has to run. If that happens every cycle, your emulator will crawl and that costs you a lot of money. This is why nobody wants to run an emulator using UVM –– which has to think up the next vector each time anything needs data. In this example, that means at least once per processor bus cycle. The more efficient the emulator, the more speed degradation is costing you.
Compilation for emulation takes a long time and a recompile should be avoided until absolutely necessary. Later in the verification process, recompiling the design becomes less frequent as most of the bugs are found. However, the testbench is changed often, almost on every run. If the testbench is written in the Verilog hardware description language, this would trigger a recompilation, creating a significant overhead. With Test Suite Synthesis for emulation, the output of the tool would be C-code to run on the processors and transactions for the I/O. These tests are stored in the design memory and loaded as an object file at runtime. Changes do not cause a recompile, just a reload of the memory which is fast. This can dramatically change the compile dynamic.
Portable Stimulus can also make a real difference because we pre-generate all the stimulus for the run ahead of time. While it may not be possible to store all that stimulus inside the emulator, it is possible to utilize some memory such that large amounts of stimulus can be transferred in bulk, only stopping the emulator when the buffers need to be refilled.
At the same time, a similar modification can be made such that data for result checking is also buffered until it either needs to get flushed, or the emulator has been halted for more stimulus. At that point, the response data can be downloaded and checked to see if there have been any failures up to that time. If there has been, the emulator run can be terminated. There may already be enough information for debug to start. If not, the time of the failure is known and thus a debug run in the emulator only has to be executed up to the time of first failure (figure 1).
Breker has partnered with some emulation providers to bring this solution to reality. They, in turn, have been working to perfect hybrid emulation, and now they can add the Breker Test Synthesis solution to that. We will continue working together to improve the combined solution and make the original concept of the Accellera standard a reality. We now can take a single model and target it to multiple execution platforms. This eliminates what was a tedious task and result in more efficient usage of the emulators. As always, please feel free to reach out to me with questions or comments.