1.麒麟软件有限公司;2.天津市操作系统重点实验室; 3.国际关系学院 国际政治系
摘要: 当前,人工智能作为信息产业新质生产力的典型代表,已成为世界主要国家提升国家竞争力、维护国家安全的重大战略,而算力短缺正在成为制约我国人工智能发展的关键瓶颈。针对目前国产化算力存在的生态碎片化问题,提出打造以具备AI增强的通用服务器操作系统为基础、以智算平台为使能的智算操作系统,更好地支持AI应用的开发和运行,以满足我国人工智能发展的算力需求。围绕智算平台的重要组成部分,详细说明了异构资源调度器和AI编程框架的国内外发展现状,同时对异构算力的管理调度与分布式训练的发展情况进行了分析。在阐述国内外AI服务器市场情况和异构算力资源管理已成现实的基础上,指出我国AI算力发展的现状,并通过系统梳理我国对操作系统发展的相关支持政策,进一步印证了研制智算操作系统的可行性和必要性。继而重点解析了智算操作系统两大组成部分通用服务器操作系统的AI增强和智算平台的主要功能,对智算操作系统的技术突破和创新发展提出了建议。
Research on the development paths of intelligent computing operating system
Abstract: At present, artificial intelligence which stands as a new quality productive force in the IT industry, is a major strategic focus among the major economies for enhancing the national competitiveness and safeguarding the national security. However, the scarcity of computing power has emerged as a critical bottleneck impeding China’s AI development. To cope with the issue of ecological fragmentation in self-developed computing power, this paper puts forward building an intelligent computing operating system based on the general server operating system with AI enhancement and enabled by the intelligent computing platform. By better supporting the development and operation of AI applications, the intelligent computing operating system can meet the computing power demand for development of AI in China. Focusing on the important parts of the intelligent computing platform, the domestic and international development status of heterogeneous resource scheduler and AI programming framework are elaborated in details. The development of the management and scheduling of heterogeneous computing power and the distributed training are also analyzed. Based on the description of the domestic and international AI server market and the reality of heterogeneous computing resource management, the current situation that the capability of chip is weak and the industrial ecology is scattered in the development of AI computing power in China is pointed out. Through systematically sorting out the relevant supporting policies of the development of operating systems in China, the feasibility and necessity of developing the operating system are further confirmed. Subsequently, this paper presents the AI enhancement of the general server operating system, which is one of the two major components of t
Key words : intelligent computing operating system;computing power;heterogeneous resource scheduler;AI programming framework;server operating system


当前,新一轮科技革命和产业变革突飞猛进,随着人工智能技术的爆炸式发展,GPT-4、Sora等大模型相继横空出世,对操作系统迭代产生了深远影响,进一步拓展了操作系统的应用空间。国际上,以微软、RedHat等为代表的主流操作系统企业,已积极拥抱人工智能技术发展。其中,微软前后投资OpenAI超过100亿美元,推出了一系列人工智能产品和解决方案,如通过AI技术赋能Office套件、Bing搜索等核心产品。RedHat和Ubuntu等Linux操作系统企业则通过提供相应的驱动程序支持以及定期更新和维护,保证了与CUDA和NVIDIA GPU的完全兼容性,并支持主流机器学习和深度学习框架、库和工具,如TensorFlow、PyTorch等。但国内暂时还没有出现与人工智能大模型发展相适应,相对成熟、完善的智算操作系统解决方案。此外,国产化算力平台存在的生态碎片化、架构差异化、软件不完备等现状也正在成为制约国内人工智能发展的主要瓶颈。






