I. With an NPU, do we still need a DSP?

· Both are processors designed to accelerate specific algorithms, however, NPUs target neural network algorithms while DSPs target signal processing algorithms, each with its own focus.

NPUs adopt the Princeton architecture.

DSPs adopt the Harvard architecture.

Each has its own instruction set.

DSPs specialize in digital signal processing. Even DSP processors are categorized into many different models and types, targeting various signal types, such as audio signals, images, and so on.

NPUs specialize in neural network processing, handling different types of data.

Processing AI with a general-purpose CPU versus an NPU yields different speeds, but why can't we do without a CPU? Because NPUs can only handle AI computations; they are not suitable for other tasks.

· A DSP (Digital Signal Processor) is specifically designed for digital signal processing. A DSP is a unique microprocessor (adopting a Harvard architecture with a fixed internal structure) that has a complete instruction set and operates through instructions and data (similar to how CPUs and ARM processors work). Its development follows embedded software design principles, focusing more on algorithm implementation.

An NPU is an embedded neural network processor. It adopts a "data-driven parallel computing" architecture and is particularly adept at processing massive multimedia data such as video and images.

NPUs integrate functions and services that support necessary service operations. DSPs can be customized to integrate and increase channel density and speed, but they are not designed to enhance DSL service capabilities, such as better network resilience, more refined network traffic management, more comprehensive service assurance, or greater flexibility for new service opportunities. NPUs have advantages in these functions. Regarding the measurement of computing power,

Renowned international media outlet Anandtech selected four smartphones: Huawei Mate 9 (with Kirin 960), Huawei Mate 10 Pro (with Kirin 970), Google Pixel 2 XL (with Snapdragon 835), and LG V30 (with Snapdragon 835). They tested the AI performance of ARM CPU, Hexagon DSP, and NPU respectively. The test results were measured using dual metrics: performance (fps) and efficiency (mJ/inference).

There is a massive difference of tens of times in performance between NPUs and CPUs when processing AI computations. These algorithms, when executed on a CPU, achieve a maximum speed of only 2fps, while consuming a large amount of power. The average power consumption of both Snapdragon 835 and Kirin 960 CPUs has already exceeded the sustainable operating limit.

In contrast, the Hexagon DSP of Snapdragon 835 offers an 8-10 times performance improvement compared to the CPU, and the NPU performance of Kirin 970 can reach 1.5 to 4 times that of the Hexagon DSP. However, in terms of power efficiency, while Huawei's NPU has an advantage over the CPU that is so significant it's in a different league, the overall efficiency of Snapdragon 835's Hexagon DSP lags behind Kirin 970's NPU by only about 6%.

Traditional chip manufacturers (such as CPU, GPU, and DSP) attach great importance to the deep learning market. Therefore, leveraging their massive scale and marketing/sales capabilities, they vigorously promote the use of these traditional chips for deep learning processing. Essentially, this involves fine-tuning existing technologies and adapting traditional SIMD architectures to neural networks.

However, because traditional CPUs, GPUs, and DSPs are not inherently designed with hardware neurons and synapses as their fundamental processing units, they naturally have certain disadvantages compared to NPUs in deep learning. Given comparable levels of chip integration and manufacturing process technology, their performance will inevitably be inferior to NPUs.

II. DSP+FPGA+AI NPU Solution

Xinmai Technology's high-performance domestic signal processing platform adopts a hardware architecture featuring multi-channel bidirectional 10Gbps optical fiber data transmission + domestic V7 FPGA + multiple domestic multi-core DSPs. This platform can achieve high-speed real-time signal processing in areas such as integrated electronic systems, active phased array radar, electronic reconnaissance, MIMO communication, and sonar.

The block diagram of Xinmai Technology's signal processing platform is shown in Figure 1. It consists of four sets of 4-channel bidirectional optical fiber interfaces, an FPGA and its expansion circuits (two groups of independently readable/writable DDR3 memory, four serial ports extended by