Instead of a conventional Von Neumann architecture which is designed to move data, the run200AI devices used in the tsunAImi accelerator cards are designed for computation. The processing element is located inside the memory to create a distributed processing array.
The runAI200 devices use integer data types and a batch mode of 1. The memory bank has 385kbyte of SRAM with a 2D array of 512 processing elements. There are 511 banks per chip, which combine to provide 200Mbyte per device. Operating is up to 502 TOPS in ‘sport’ mode. Configured in ‘eco’ mode, it delivers 8TOPS.
The accelerator card’s compute power translates into over 80,000 fps of ResNet-50 v 1.5 throughput at batch=1, or three times the throughput of its nearest competitor, says the company. In another benchmark test, for natural language processing, the accelerator cards can process more than 12,000 queries per second of BERT-base, says the company. This is four times faster than any announced product, it says.
The runAI200 devices are manufactured using a cost-effective 16nm process.
AI for inference will be a significant element in data centres, where it will its computation density will accelerate performance for smart cities and other AI and machine learning applications.
The tsunAImi accelerator card is a standard form factor PCI Express card for use in the cloud or servers. It supports the TensorFlow and PyTorch open software for machine learning.
To accompany the accelerator card, the Untether AI imAIgine software development kit (SDK) has push-button quantisation, optimisation, physical allocation and multi-chip partitioning. It also provides a visualisation toolkit, cycle-accurate simulator and a runtime API for integration.
The tsunAImi accelerator card is sampling now and will be commercially available in Q1 2021. The imAIgine SDK is in the early access with select customers and partners.