Breker Verification Systems
  • Products and Apps
    • Trek Suite
      • Overview
      • TrekUVM
      • TrekSoC
      • TrekSoC-Si
    • TrekApps
      • Overview
      • ARMv8
      • Cache Coherency
      • Power Management
      • RISC-V
      • Security
      • Integrity FASTApps
    • Scenario Modeling
      • Overview
      • Intent Specification
      • Portable Stimulus
      • Native C++
  • Resources
    • Blog
    • Articles
    • Case Studies
    • White Papers
    • Videos
  • News
    • Press Releases
    • In the News
    • Events
    • Newsletters
  • Company
    • About Breker
    • Management
    • Customers
    • Ecosystem
    • Careers
  • Contact Us
    • Contact
    • Support
  • Search
  • Menu Menu
  • Products and Apps
    • Trek Suite
      • Overview
      • TrekUVM
      • TrekSoC
      • TrekSoC-Si
    • TrekApps
      • Overview
      • ARMv8
      • Cache Coherency
      • Power Management
      • RISC-V
      • Security
      • Integrity FASTApps
    • Scenario Modeling
      • Overview
      • Intent Specification
      • Portable Stimulus
      • Native C++
  • Resources
    • Blog
    • Articles
    • Case Studies
    • White Papers
    • Videos
  • News
    • Press Releases
    • In the News
    • Events
    • Newsletters
  • Company
    • About Breker
    • Management
    • Customers
    • Ecosystem
    • Careers
  • Contact Us
    • Contact
    • Support

Resources

Blog

Verifying AI Engines

September 20, 2021/by Adnan Hamid

It has been said that there are more than 100 companies currently developing custom hardware engines that can accelerate the machine learning (ML) function. Some target the data center where huge amounts of algorithm development and training are being performed. Power consumption has become one of the largest cost components of training, often utilizing large numbers of high-end GPUs and FPGAs to perform the task today. It is hoped that dedicated accelerators will be able to both speed up this function and perform it using a fraction of the power. Algorithms and networks are evolving so rapidly that these devices must retain maximum flexibility.

Other accelerators focus on the inference problem that runs input sets through a trained network to produce a classification. Most are deployed in the field where power, performance and accuracy are being optimized. Many are designed for a particular class of problem, such as audio or vision, and being targeted at segments including consumer, automotive or the IoT. Each restricts the flexibility that is necessary. Flexibility becomes a design optimization choice – the more that is fixed in hardware means greater performance or lower power but the software side is less amenable to change.

At the heart of most of these devices is an array of computational elements that have access to distributed memory for holding weights and intermediate data. Waves of data will be fed into the computations array and data will flow out of it. The computational elements could be fixed function (such as a multiply-accumulate function) or could be specialized DSP blocks. Some implement them as hardwired blocks, others resemble highly focused processors, and yet others look more like FPGA blocks. Each of these blocks is then replicated many times and connected using some interconnect fabric. Other parts of the device will be responsible for managing the flow of data through the chip or for performing custom functions that do not nicely map into the computational fabric.

The big question is: how does one verify these devices? I am not going to claim that I have the answer. It is not clear if anyone has a definitive answer today. This subject area is nascent, and everyone is trying to learn from the best practices of the past while recognizing the unique challenges of these devices.

The programming scheme for these devices is also considerably different from the relationship that exists between software and an instruction set processor. Traditional languages used for programming are stable. While new languages are being designed, they all utilize the same underlying concepts. New languages basically try to optimize the development process. AI software goes through a compilation processes that is highly unpredictable. It also produces non-deterministic results. Retraining a network may produce an inference network that is considerably different from a preceding one after it has been quantized and optimized. This means that ‘real’ software has less relevance to the verification task than in the past.

It is important to consider the task being performed. When using Portable Stimulus (PSS) and Test Suite Synthesis, we are not trying to ascertain if the architecture is a good one; we are trying to ascertain that the architecture as defined works. The PSS graph does not know what representative workloads may look like, or what might represent worst-case conditions. While you may be able to extract a power trace from a testcase that was generated using Test Suite Synthesis, this may not represent a typical operating condition. Likewise, it is not possible to generate tests that might test throughput. The verification of these attributes requires over constraining the model to generate synthetic benchmarks. There are pros and cons to using this approach versus real life sample workloads and networks.

But the verification of the engine on a few sample workloads does not provide enough confidence that the engine will be able to verify the range of networks it is likely to encounter during its lifespan. It thus becomes important that effective verification strategies are developed that test the fundamental dataflows through these engines. Luckily, this is a task highly suited to PSS and Test Suite Synthesis.

A neural network is a set of nodes with connections between them. The best way to verify that is to focus on the outcomes, the results that you want to see from the last node. Then you can say that if this is the result you want to see, the previous three nodes must have this input and thus the nodes before that must have… and so on. That is how the problem solver inside the Test Suite Synthesis engine works and that is why it is very good for this nature of problem.

The job of the software toolchain for an AI engine is to schedule pieces of work on various pieces of hardware so that the right result comes out at the right time. Using the AI toolchain to come up with a representative set is too difficult. We must start with the hardware itself and look at the capabilities that have been built into its architecture. We want to make sure that this queue of operations works correctly, or we want to make sure a particular resource is maxed out. You can reason back through the network to see what you must feed in to achieve those corner cases.

The verification of a neural accelerator must rely on the same hierarchical approach used for more traditional processors, but with perhaps a different weighting of concentration. First, the tiles must be verified as close to exhaustively as possible. Depending upon the complexity of the blocks, this could be done with SystemVerilog and a constrained random verification methodology, or if they contain processors, PSS could be used.

Second, the network that interconnects them must be verified. This could be a more extensive problem than is seen for system assembly today because of the large numbers of blocks involved, but it is likely to involve a regular array structure, so that may simplify the task.

Third, we need to define graphs in PSS that define the important dataflows through the devices and use Test Suite Synthesis to explore those. At first, this would be a simple verification of the individual flows, and then multiples of these can be scheduled concurrently to look for unintended interactions between them or to locate congestion points in the device.

Finally, some sample networks can be run, and these can be used to ascertain the effectiveness of the device at achieving desired performance or power goals. However, the concentration of real software is likely to be less than it was for traditional software running on processors.

The first wave of custom accelerator chips can be expected to hit the market quite soon. It remains to be seen how many will be successful, how many will have functional errors that will be difficult to hide using the software tool chain, and how many will reach the desired levels of performance or power.

Breker Verification Systems will continue to work with the industry to ensure that it has access to the tools it needs to enable a good functional verification methodology. We are listening and learning along with the industry. Together we can do this.

Share this entry
  • Share on Facebook
  • Share on Twitter
  • Share on WhatsApp
  • Share on Pinterest
  • Share on LinkedIn
  • Share on Tumblr
  • Share on Vk
  • Share on Reddit
  • Share by Mail
https://brekersystems.com/wp-content/uploads/2018/02/breker-logo4.png 0 0 Adnan Hamid https://brekersystems.com/wp-content/uploads/2018/02/breker-logo4.png Adnan Hamid2021-09-20 09:00:572022-02-22 00:28:54Verifying AI Engines

Press Contact

Nanette Collins
P: +1 617.437.1822
nanette@nvc.com

Recent Posts

  • Security: Making the Unknown, Known

  • Inside Portable Stimulus — Hardware Software Interface

  • Think Like a (French Farmhouse) Bug

  • Inside Portable Stimulus: Verification Efficiency

  • Inside Portable Stimulus –– Maximizing the Emulator

Recent Posts

  • Breker Verification Systems Unveils Easy-To-Adopt Integrity FASTApps Targeting RISC-V Processor Core, SoC Verification Scenarios

  • Breker Verification Systems and Codasip Announce Co-operation to Drive Open, Commercial-Grade RISC-V SoC Verification Processes

  • Imperas Announces Partnership with Breker to Drive Rigorous Processor to System Level Verification for RISC-V

  • Breker Verification Systems Joins RISC-V International as a Strategic Member to Drive Cache Coherency and SoC Integration Verification Methodologies

  • Breker Verification Systems’ Maheen Hamid Named to 100 Most Influential Women in Silicon Valley List by Silicon Valley Business Journal

Products

TrekUVM
TrekSoC
TrekSoc-Si

TrekApps

ARMv8
Cache Coherency
Power Management
RISC-V
Security
Integrity FASTApps

Scenario Modeling

Intent Specification
Portable Stimulus
Native C++

Contact Breker

Contact
Support
P: +1.650.336.8872
E: info@brekersystems.com
© Copyright Breker Verification Systems
  • Facebook
  • Twitter
  • LinkedIn
  • Youtube
  • Legal
  • Privacy
Scroll to top