Nash Insights – AI Computing – N.03 2025

Graphics Processing Units (GPUs) have become a foundational technology in the advancement of artificial intelligence.

More recently, Neural Processing Units (NPUs) have gained prominence as specialized accelerators designed for machine learning workloads. But how do these architectures differ from the conventional Central Processing Unit (CPU), and in which contexts are they most effective?

GPUs were originally developed for gaming applications to offload computationally intensive graphics rendering, thereby freeing the CPU to manage other system tasks. Architecturally, GPUs differ from CPUs in two key ways: they integrate a large number of relatively simple cores and are optimized for massively parallel processing, whereas CPUs feature fewer, more complex cores designed for sequential and general-purpose computation.

Some GPU architectures contain thousands of cores, whereas standard CPUs typically include up to six cores—though high-end server-grade CPUs may feature significantly more. However, this increase in core count alone does not compensate for the lack of true parallel processing capability inherent to CPU design.

While each GPU core is relatively simple, the massive parallelism enabled by their large number of cores allows complex computations to be decomposed into smaller, concurrent tasks giving them “a clear advantage when analysing huge data sets, where the same calculations need to be performed on all of the data.” As a result, GPUs can substantially outperform CPUs in workloads such as image recognition, deep learning, and other highly parallelizable tasks.

Parallel processing is a critical technique for accelerating complex computations and data pattern recognition—core operations within deep learning and neural network architectures. As noted in industry analyses, “GPUs are used to speed up training times for machine learning applications and perform the kinds of tensor math and matrix multiplication ML systems require to make inferences and produce useful results.”

While GPUs function as powerful machine learning accelerators, they are not designed to replace CPUs. Every computing system still relies on the CPU, often referred to as “the brain of the computer”, to manage general-purpose operations. In contrast, GPUs act as coprocessors that extend the system’s computational capacity and are themselves orchestrated by the CPU. As experts emphasize, “it is no longer a question of CPU versus GPU. More than ever, you need both to meet your varied computing demands. The best results are achieved when the right tool is used for the job.”

Experimental results demonstrate that running a large language model (LLM) solely on a CPU is suboptimal. Although feasible with extensive data and model optimizations, CPU utilization tends to spike during inference or training, which can negatively impact other CPU-dependent processes. To maintain system efficiency and responsiveness, LLM workloads should be offloaded to specialized AI accelerators, such as GPUs or NPUs, which are designed to handle parallelized tensor computations.

While GPUs typically deliver higher performance, they also consume more energy. CPUs, by contrast, remain more common and cost-effective to acquire and operate. As noted in industry assessments, “CPUs with built-in GPUs deliver space-, cost-, and energy-efficiency benefits over dedicated graphics processors.”

In some cases, CPUs are integrated with Neural Processing Units (NPUs), which complement both CPUs and GPUs in executing algorithms and neural network operations and perform the “high-performance inferencing tasks required by AI”. NPUs are specifically designed to excel at executing large-scale matrix multiplications across extensive datasets.

Compared to GPUs, NPUs represent a parallel computing alternative optimized for energy efficiency. As industry analyses note, “NPUs prioritize data flow and memory hierarchy for better processing AI workloads in real-time.”

The computational power required to train and fine-tune a large language model (LLM) depends heavily on the model’s complexity, which is reflected in both the volume of training data and the number of parameters it contains. The most significant developments and insights on this topic will be explored in the next issue ofNash Insights.