🟧 Google announces 8th generation TPU “8t/8i”, Broadcom and MediaTek are used to switch AI semiconductors to “application-specific design”

企業分析

Google’s separation of AI semiconductor TPUs for learning and inference, as well as the division of design partners, shows that AI chips have shifted from “performance competition” to “optimization and supply strategy”.

🟧 Google announces 8th generation TPU “8T/8i”, to application-specific design and division of labor system

Google has announced the eighth generation of AI semiconductor TPU (Tensor Processing Unit) “TPU 8t” for learning and “TPU 8i” for inference. The biggest feature is that AI processing, which has been supported by a single architecture, has been separated by application and a design optimized for each has been adopted. In addition, we have separate design partners to achieve both performance and efficiency.

  • “TPU 8t”
    for training → for training large-scale AI models. Broadcom was involved in the design, emphasizing high performance and high-speed communication
  • “TPU 8i” for inference
    → for AI service execution. MediaTek is involved in the design, focusing on low power consumption and cost efficiency
  • Technological Advancements
    → Improved computational efficiency through FP4 (low-bit computing) and dedicated networks such as Virgo Network (for learning) and Boardfly (for inference) have significantly improved processing performance and power efficiency

As a result, Google has strengthened the core foundation of the “AI hypercomputer” that optimizes AI development and service provision.

🟧The inevitability of “separation of learning and reasoning” created by the AI agent era

The background of this announcement is the evolution of generative AI. Conventional AI focused on “answering questions” processing, but now it has evolved into agency AI that performs multiple processes continuously. This change has led to very different performance requirements for learning and inference.

“Maximum performance” is important because learning performs a large amount of calculations in a limited environment, but “low latency, low power consumption, and cost” are important because inference is always performed by each user. This made designs that covered both with a single chip inefficient and application-specific optimization essential.

Looking at the competition, NVIDIA covers both uses with GPUs, while Amazon separates Trainium (learning) and Inferentia (inference). Google’s strategy goes one step further and aims to achieve a higher level of optimization by dividing the design partner by application.

🟧 Conclusion

Google’s 8th generation TPU is divided into “8t” for learning and “8i” for inference, maximizing the efficiency of AI processing, while also optimizing the design by combining companies with different strengths: Broadcom and MediaTek.

The separation of learning and inference is a major trend in AI semiconductors today, and it is also similar to the structure of Xeon for servers and cores for clients. As AI workloads become more sophisticated and integrated in the future, it is quite possible that the two will come closer again and return to some kind of unified architecture.

Copied title and URL