Accelerated RISC-V core optimised for edge AI and cryptography

Red Semiconductor has announced RISC-V instruction set extensions, and a hardware design, for edge AI and cryptography in asics and FPGAs.

The hardware, called ‘VISC’, is an accelerated RISC-V core that “optimises complex mathematical algorithms for parallel execution in its reconfiguration hardware engine”, according to Red, which claims: “The VISC ISA [instruction set] enables developers to describe complex algorithms in just a fraction of the code size it would take with the standard RISC-V instruction set, RISC-V vector extensions, or other ISA like x86 and Arm.”

“Our instructions efficiently vectorise RISC-V’s standard scalar instructions to enable VISC’s parallel execution sequencing to be applied to them,” added Red. “It turns out this is a much neater way that using RISC-V’s own RVV [vector] instructions and vector registers.”

Described as ‘single-issue multi-execute’ hardware, VISC, uses precoding to parallelise RISC-V scalar instructions. Registers, decoders and the execution engine, according to the company, are optimised for parallel computation of functions like FFT (fast Fourier transform), DCT (discrete cosine transform), matrix multiplication and ‘big integer’ maths – the latter giving accurate results from long number arithmetic, compared with floating point approximations.

For FFT, the company told Electronics Weekly that 8-12 instructions with its architecture replace “over 2,000 instructions in other ISAs”, then 24-30 instructions compared with >2,000 for DCT, and 10-14 instructions compared with >500 for ‘positional popcount’ big data analysis.

“In each case, the reduction is a result of the combination of VISC’s set-up instructions which configure the hardware for optimum execution sequence, and the ‘maths toolbox’ instructions which utilise new instruction forms that VISC layers on top of the basic RISC-V instructions,” said Red, which is not revealing how the hardware is reconfigured, nor the instructions used in the examples above.

Will all this instruction compression result in higher gate count?

“The efficiency of optimisation for parallel execution means we may have a little more die area in our single-issue decompression-reconfiguration engine,” Red CEO James Lewis (pictured) told Electronics Weekly, claiming: “But we save die area because we multiply just the execution units, whereas [others] must multiply the entire multi-stage processor architecture to get the same result. We therefore get superior performance – in terms of algorithms executed – per clock cycle, per silicon area and per watt consumed.”

Scalability for different implementations is through varying the number of execute pathways per instruction pipeline, the number of instruction pipelines per VISC core or the number of VISC cores.

“Launch variants of the VISC core will focus on scalability of execution pathways, with instruction pipeline and core scaling coming later in the roadmap,” said Red.

RED Semiconductor is part of the first cohort of the UK Government-backed ChipStart incubator, launched through the National Semiconductor Strategy and run by Silicon Catalyst UK.