Alibaba Cloud’s new technology “Aegaeon” successfully reduced the number of GPUs required for AI inference, representing China’s AI competitiveness.
SOSP 2025: The 31st Symposium on Operating Systems Principlessigops.org
🟧Aegaeon: Alibaba’s New Technology to Reduce GPU Usage by 82%
A research team from Chinese internet giant Alibaba Group and Peking University has announced a new cloud-based technology called “Aegaeon” that efficiently manages GPU resources. The announcement was made at SOSP 2025, an international conference held in Seoul, South Korea, and has attracted the attention of the cloud industry and investors.
- Significantly reduced GPU usage: Conducted more than three months of testing on Alibaba Cloud, reducing the number of NVIDIA “H20” GPUs used from 1,192 to 213, approximately 82%.
- Visualizing LLM operating costs: Aegaeon claims in its paper that it has revealed for the first time the “invisible cost structure” of large language model (LLM) operations.
- Efficiency of AI services: Adopt a mechanism that shares GPUs on a “token-based” basis rather than on a per-model basis, and runs many models at the same time. This dramatically reduced AI inference costs.
🟧GPU shortage and US-China AI competition promoted a “computing efficiency revolution”
The background to this technological development is the shortage of GPUs due to US semiconductor export restrictions to China. Chinese companies are highly dependent on NVIDIA GPUs, and U.S. export controls have limited the supply of modern chips such as the H100 and B200. For this reason, the Chinese are focusing on “technology that maximizes performance with limited GPUs“, and Aegaeon is also positioned in this trend.
また、アリババは自社のAIモデル「Qwen(通義千問)」シリーズを急速に拡大しており、推論処理の負荷は増大中です。GPU効率を高めるAegaeonは、単なる節約策ではなく、AI事業の持続的な競争力確保のためのインフラ戦略といえます。アリババはすでに自社製AI半導体の開発も進めており、ソフトとハード両面からAIコスト構造の再設計を進めています。
🟧まとめ
Alibaba’s “Aegaeon” has emerged as a core of China’s AI cloud strategy as a technology that fundamentally overhauls GPU efficiency in AI inference. The result of reducing GPU usage by as much as 82% is an impact that cloud providers cannot ignore.
💬Aegaeon is not a cloud optimization, but a scheduling technology that devises “how to distribute and share AI calculations”. Inference has great effects, but learning has many communication and synchronization constraints, making it difficult to apply. In that regard, NVIDIA’s dominance is likely to continue for some time.

