`
Deep neural networks (DNNs) that power foundation models and generative AI are transforming our lives. However, traditional digital computing architectures are not optimal for these models, as they separate memory and processing units. This causes data movement between them, which reduces speed and efficiency. Hardware designed for AI inference can overcome this challenge, but many of them still use this split architecture.
Analog AI is a new way of computing AI that mimics how the brain works. It uses nanoscale devices called PCM to store and process data as a range of values, not just 0s and 1s. This makes it faster and more energy-efficient than digital AI. However, analog AI faces two main challenges: it needs to be as accurate as digital AI, and it needs to work well with other digital components on the chip.
A new chip that uses phase-change memory
Neural networks are powerful tools for artificial intelligence, but they require a lot of energy and time to process data. One way to overcome this challenge is to use analogue in-memory computing (AIMC), which performs computations directly within the memory where the network weights are stored. This reduces the need to move data around, which saves energy and latency.
However, AIMC is not enough to achieve end-to-end improvements in performance. It also needs to be combined with on-chip digital operations and communication, as well as robust and scalable memory devices. A team of researchers from IBM and other institutions has developed a multicore AIMC chip that integrates all these components using phase-change memory (PCM) as the memory device.
PCM is a type of resistive memory that can store multiple levels of information by changing its electrical resistance. It can also perform MVMs by applying currents to the memory cells and measuring the resulting voltage. The researchers designed and fabricated a chip with 64 AIMC cores, each containing 256 × 256 PCM cells, interconnected by an on-chip network. The chip also implements the digital activation functions and additional processing involved in convolutional and recurrent neural networks.
The chip can achieve near-software-equivalent inference accuracy with ResNet and long short-term memory (LSTM) networks, while performing all the computations associated with the weight layers and the activation functions on the chip. For 8-bit input/output MVMs, the chip can achieve a maximum throughput of 16.1 or 63.1 tera-operations per second (TOPS) at an energy efficiency of 2.48 or 9.76 TOPS per watt (TPW), respectively, depending on the operational mode.
This work demonstrates that PCM-based AIMC can enable high-performance, low-power and scalable neural network inference on a single chip. It also opens up new possibilities for exploring novel architectures and applications for AIMC.
You can read the full paper here: https://www.nature.com/articles/s41928-023-01010-1