type
status
date
category
tags
slug
summary
password
AI summary
icon
All content on this blog is intended solely for the purpose of learning English and is not meant to infringe on any copyrights. All translations and citations are used for educational and non-commercial purposes. This blog is publicly accessible and free, with no profit being made from its content.
If you believe that any content on this blog infringes on your copyright, please contact me at the following email address, and I will promptly remove the content in question after verification.
Thank you for your understanding and support.
Contact: kechang.dev@gmail.com
The race is on to control the global supply chain for AI chips | 争夺全球 AI 芯片供应链控制权的竞赛已经开始
Jul 30th 2024
In 1958 Jack Kilby at Texas Instruments engineered a silicon chip with a single transistor. By 1965 Fairchild Semiconductor had learned how to make a piece of silicon with 50 of the things. As Gordon Moore, one of Fairchild’s founders, observed that year, the number of transistors that could fit on a piece of silicon was doubling on a more or less annual basis.
在 1958 年,Jack Kilby在 Texas Instruments 设计了一块带有单个晶体管的硅芯片。到了 1965 年,Fairchild Semiconductor 已经学会如何在一块硅片上制造 50 个这样的晶体管。正如 Fairchild Semiconductor 的创始人之一 Gordon Moore 在那一年观察到的,能够放置在一块硅片上的晶体管数量基本上每年都在翻倍。 transistor n. 晶体管
In 2023 Apple released the iPhone 15 Pro, powered by the a17 bionic chip, with 19bn transistors. The density of transistors has doubled 34 times over 56 years. That exponential progress, loosely referred to as Moore’s law, has been one of the engines of the computing revolution. As transistors became smaller they got cheaper (more on a chip) and faster, allowing all the hand-held supercomputing wonders of today. But the sheer number of numbers that ai programs need to crunch has been stretching Moore’s law to its limits.
在 2023 年,苹果发布了搭载 A17 仿生芯片的 iPhone 15 Pro,该芯片拥有 190 亿个晶体管。晶体管的密度在56年间翻倍了34次。这种指数级的进步,被宽泛地称为摩尔定律,一直是计算革命的引擎之一。随着晶体管变得更小,它们变得更便宜(每个芯片上的数量更多)且更快,从而实现了当今所有手持超级计算的奇迹。然而,人工智能程序需要处理的大量数据已经将摩尔定律推到了极限。 bionic adj. 仿生的 density n. 浓密;密度 exponential adj. 幂的;指数的 loosely adv. 宽松的;松散的 sheer adj. 十足的;纯粹的;陡峭的 crunch v. 嘎吱嘎吱地动 | n. 嘎吱声;艰难局面;危急情况
The neural networks found in almost all modern AI need to be trained in order to ascertain the right “weights” to give their billions, sometimes trillions, of internal connections. These weights are stored in the form of matrices, and training the model involves manipulating those matrices, using maths. Two matrices—sets of numbers arrayed in rows and columns—are used to generate a third such set; each number in that third set is produced by multiplying together all the numbers in a row in the first set with all those in a column of the second and then adding them all up. When the matrices are large, with thousands or tens of thousands of rows and columns, and need to be multiplied again and again as training goes on, the number of times individual numbers have to be multiplied and added together becomes huge.
几乎所有现代 AI 中的神经网络都需要经过训练,以确定其数十亿甚至数万亿内部连接的正确“权重”。这些权重以矩阵的形式存储,训练模型涉及使用数学操作这些矩阵。两个矩阵——即按行和列排列的一组数字——被用来生成第三个这样的集合;第三个集合中的每个数字都是通过将第一个集合中的一行所有数字与第二个集合中的一列所有数字相乘,然后将它们全部相加得到的。当矩阵很大,包含成千上万行和列,并且在训练过程中需要反复相乘时,单个数字需要相乘和相加的次数就会变得非常庞大。 ascertain v. 查明;弄清;确定 manipulation n. 操纵;操作;处理
In this series on artificial intelligence
- The matrix multiplications*
The training of neural nets, though, is not the only objective that requires lightning-fast matrix multiplication. So does the production of high-quality video images that make computer games fun to play: and 25 years ago that was a far larger market. To serve it Nvidia, a chipmaker, pioneered the design of a new sort of chip, the graphics-processing unit (gpu), on which transistors were laid out and connected in a way that let them do lots of matrix multiplications at once. When applied to ai, this was not their only advantage over the central processing units (CPUs) used for most applications: they allowed larger batches of training data to be used. They also ate up a lot less energy.
然而,神经网络的训练并不是唯一需要极快矩阵乘法的目标。高质量视频图像的生成也是如此,这使得电脑游戏更具可玩性:而25年前,这是一个更大的市场。为了服务这一市场,芯片制造商英伟达(Nvidia)开创了一种新型芯片的设计,即图形处理单元(GPU),其晶体管的布局和连接方式使其能够同时进行大量矩阵乘法。当应用于人工智能时,这并不是它们相对于用于大多数应用的中央处理单元(CPU)的唯一优势:它们允许使用更大批量的训练数据。此外,它们也消耗更少的能量。 objective n. 目标
Training AlexNet, the model which ushered in the age of “deep learning” in 2012, meant assigning weights to 60m internal connections. That required floating-point operations (flop); each flop is broadly equivalent to adding or multiplying two numbers. Until then, that much computation would have been out of the question. Even in 2012, using the best CPUs would not just have required a lot more time and energy but also simplifying the design. The system that trained AlexNet did all its phenomenal FLOPping with just two GPUs.
训练 AlexNet(在 2012 年开启 “深度学习” 时代的模型)意味着要为 6000 万个内部连接分配权重。这需要 次浮点运算(flop)。每次 flop 大致相当于对两个数字进行加法或乘法运算。在那之前,如此大量的计算是不可想象的。即使在 2012 年,使用最好的 CPU 不仅需要更多的时间和能源,还需要简化设计。而训练 AlexNet 的系统仅用两块 GPU 就完成了所有庞大的浮点运算。 usher n. 接待员;门房 | v. 引领;招待;做招待员
A recent report from Georgetown University’s Centre for Emerging Technology says GPUs remain 10-100 times more cost-efficient and up to 1,000 times faster than cpus when used for training models. Their availability was what made the deep-learning boom possible. Large language models (LLMs), though, have pushed the demand for calculation even further.
Georgetown University’s Centre for Emerging Technology 的一份近期报告指出,在训练模型时,GPU 仍然比 CPU 高效 10 到 100 倍,速度快达 1,000 倍。正是 GPU 的可用性使得深度学习的繁荣成为可能。然而,大型语言模型(LLMs)进一步推动了计算需求的增长。
Transformers are go | Transformer 准备就绪
In 2018 Alec Radford, a researcher at OpenAI, developed a generative pre-trained transformer, or gpt, using the “transformer” approach described by researchers at Google the year before. He and his colleagues found the model’s ability to predict the next word in a sentence could reliably be improved by adding training data or computing power. Getting better at predicting the next word in a sentence is no guarantee a model will get better at real-world tasks. But so far the trend embodied in those “scaling laws” has held up.
2018年,OpenAI的研究员 Alec Radford 使用谷歌研究人员前一年描述的“Transformer”方法开发了一种生成式预训练 Transformer,即 GPT。他和他的同事发现,通过增加训练数据或计算能力,可以可靠地提高模型预测句子中下一个单词的能力。虽然在预测句子中下一个单词方面的进步并不能保证模型在现实世界任务中的表现会更好,但到目前为止,这些 “scaling laws” 所体现的趋势依然有效。 so far phrase. 迄今为止 embody v. 体现;使具体化;包含 hold up phrase. 举起;支撑;继续下去;推举
As a result llms have grown larger. Epoch ai, a research outfit, estimates that training gpt-4 in 2022 required flop, 40m times as many as were used for AlexNet a decade earlier, and cost about $100m. Gemini-Ultra, Google’s most powerful model, released in 2024, is reported to have cost twice as much; Epoch AI reckons it may have required flop. These totals are incomprehensibly big, comparable to all the stars in all the galaxies of the observable universe, or the drops of water in the Pacific Ocean.
结果就是,LLMs 的规模变得更大。据研究机构 Epoch AI 估计,2022年训练GPT-4需要进行 次浮点运算(flop),是十年前用于 AlexNet 的4000万倍,花费约1亿美元。据报道,谷歌在 2024 年发布的其最强模型 Gemini-Ultra 的成本是其两倍;Epoch AI 估计其可能需要进行 次浮点运算。这些总量大得难以想象,堪比可观测宇宙中所有星系的星星数量,或太平洋中的水滴数量。 outfit n. 全套装备;整套服装; 配备;机构 | v. 装备;配置设备;供给服装
In the past the solution to excessive needs for computation has been a modicum of patience. Wait a few years and Moore’s law will provide by putting even more, even faster transistors onto every chip. But Moore’s law has run out of steam. With individual transistors now just tens of nanometres (billions of a metre) wide, it is harder to provide regular jumps in performance. Chipmakers are still working to make transistors smaller, and are even stacking them up vertically to squeeze more of them onto chips. But the era in which performance increased steadily, while power consumption fell, is over.
过去,对计算需求过大的解决方案是保持一点耐心。等待几年, Moore’s law 就会通过在每个芯片上放置更多、更快的晶体管来提供帮助。但是, Moore’s law 已经失去了动力。由于单个晶体管现在只有几十纳米(十亿分之一米)宽,因此很难实现性能的定期跃升。芯片制造商仍在努力缩小晶体管的尺寸,甚至将它们垂直堆叠以在芯片上容纳更多晶体管。但是,性能稳步提高而功耗下降的时代已经结束。 run out of steam phrase. 失效;失去动力 nanometre n. 纳米
As Moore’s law has slowed down and the desire to build ever-bigger models has taken off, the answer has been not faster chips but simply more chips. Insiders suggest gpt-4 was trained on 25,000 of Nvidia’s a100 gpus, clustered together to reduce the loss of time and energy that occurs when moving data between chips.
随着 Moore’s law 的放缓和构建更大模型的需求不断增加,解决方案不再是更快的芯片,而是更多的芯片。据业内人士透露,GPT-4 是在 25,000 个 Nvidia A100 GPU 上进行训练的,这些GPU被集群在一起,以减少芯片之间移动数据时的时间和能量损失。
Much of the $200bn that Alphabet, Amazon, Meta and Microsoft plan to invest in 2024 will go on ai-related stuff, up 45% from last year; much of that will be spent on such clusters. Microsoft and Openai are reportedly planning a $100bn cluster in Wisconsin called Stargate. Some in Silicon Valley talk of a $1trn cluster within the decade. Such infrastructure needs a lot of energy. In March Amazon bought a data centre next door to a nuclear power plant that can supply it with a gigawatt of power.
Alphabet、Amazon、Meta 和 Microsoft 计划在 2024 年投资的2000亿美元中,大部分将用于与人工智能相关的项目,比去年增长了45%;其中很大一部分将用于这样的集群。据报道,Microsoft 和 OpenAI 计划在 Wisconsin 建立一个名为 Stargate 的 1000 亿美元集群。硅谷的一些人甚至谈论在十年内建立一个价值1万亿美元的集群。这样的基础设施需要大量的能源。今年三月,Amazon 购买了一个与核电站相邻的数据中心,该电站可以为其提供一吉瓦的电力。 infrastructure n. 基础设施 gigawatt n. 十亿瓦特 watt n. 瓦特
The investment does not all go on GPUs and the power they draw. Once a model is trained, it has to be used. Putting a query to an AI system typically requires roughly the square root of the amount of computing used to train it. But that can still be a lot of calculation. For gpt-3, which required flop to train, a typical “inference” can take flop. Chips known as fpgas and asics, tailored for inference, can help make running ai models more efficient than using gpus.
投资并不全用于 GPU 及其消耗的电力。模型一旦训练完成,就需要被使用。向人工智能系统提交查询通常需要大约相当于训练该系统所用计算量的平方根的计算量。但这仍然可能是大量的计算。对于训练需要 次浮点运算(flop)的GPT-3来说,一次典型的“推理”可能需要 次浮点运算。专为推理设计的 FPGA 和 ASIC 芯片可以帮助比使用 GPU 更高效地运行 AI 模型。
Nevertheless, it is Nvidia that has done best out of the boom. The company is now worth $2.8trn, eight times more than when Chatgpt was launched in 2022. Its dominant position does not only rest on its accumulated know-how in GPU-making and its ability to mobilise lots of capital (Jensen Huang, its boss, says Nvidia’s latest chips, called Blackwell, cost $10bn to develop). The company also benefits from owning the software framework used to program its chips, called cuda, which is something like the industry standard. And it has a dominant position in the networking equipment used to tie the chips together.
尽管如此,Nvidia 在这次繁荣中表现得最好。该公司的市值现在达到2.8万亿美元,是 2022 年 ChatGPT 推出时的八倍。其主导地位不仅依赖于其在GPU制造方面积累的专业知识和调动大量资本的能力(其老板黄 Jensen Huang 表示,Nvidia 最新的芯片Blackwell的开发成本为100亿美元)。该公司还受益于拥有用于编程其芯片的软件框架 CUDA,这几乎就是行业标准。此外,Nvidia 在用于连接芯片的网络设备方面也占据主导地位。 mobilise v. 调用;动员
Supersize me | 规模扩张
Competitors claim to see some weaknesses. Rodrigo Liang of SambaNova Systems, another chip firm, says that Nvidia’s postage-stamp-size chips have several disadvantages which can be traced back to their original uses in gaming. A particularly big one is their limited capacity for moving data on and off (as an entire model will not fit on one gpu).
竞争对手声称发现了一些弱点。另一家芯片公司 SambaNova Systems 的 Rodrigo Liang 表示,Nvidia 的邮票大小的芯片有几个缺点,这些缺点可以追溯到它们最初在游戏中的用途。一个特别大的缺点是它们在数据传输方面的能力有限(因为整个模型无法放入一个GPU中)。
Cerebras, another competitor, markets a “wafer scale” processor that is 21.5cm across. Where GPUs now contain tens of thousands of separate “cores” running calculations at the same time, this behemoth has almost a million. Among the advantages the company claims is that, calculation-for-calculation, it uses only half as much energy as Nvidia’s best chip. Google has devised its own easily customised “tensor-processing unit” (tpu) which can be used for both training and inference. Its Gemini 1.5 ai model is able to ingest eight times as much data at a time as gpt-4, partly because of that bespoke silicon.
另一家竞争对手 Cerebras 推出了一款 “wafer scale” 处理器,直径为 21.5 厘米。当前的 GPU 包含数万个独立的 “核心” 同时进行计算,而这款庞然大物则拥有将近一百万个核心。该公司声称的优势之一是,在进行相同计算时,它的能耗仅为 Nvidia 最好的芯片的一半。Google 设计了自己的易于定制的 “tensor-processing unit”(TPU),兼容训练和推理。其 Gemini 1.5 AI 模型能够一次处理的数据量是 GPT-4 的八倍,部分原因是采用了这种专用芯片。 wafer n. 威化饼;圣饼;薄片 processor n. 处理器;加工机(工人) behemoth n. 巨兽; 巨头; 庞然大物 tensor n. 张量 ingest v. 咽下;摄取;吸收 bespeak v. 展望;显示;预约
The huge and growing value of cutting-edge gpus has been seized on for geopolitical leverage. Though the chip industry is global, a small number of significant choke-points control access to its ai-enabling heights. Nvidia’s chips are designed in America. The world’s most advanced lithography machines, which etch designs into silicon through which electrons flow, are all made by asml, a Dutch firm worth $350bn. Only leading-edge foundries like Taiwan’s tsmc, a firm worth around $800bn, and America’s Intel have access to this tool. And for many other smaller items of equipment the pattern continues, with Japan being the other main country in the mix.
尖端 GPU 的巨大且不断增长的价值已被用于地缘政治杠杆。虽然芯片行业是全球性的,但少数几个重要的瓶颈控制着其在人工智能领域的制高点。Nvidia 的芯片是在 America 设计的。世界上最先进的光刻机——将设计蚀刻到硅片上以供电子流动——全部由 Dutch 公司 ASML 制造,该公司市值为 3500 亿美元。只有像 Taiwan 的 TSMC(市值约8000亿美元)和 America 的 Intel 这样的顶尖代工厂能够使用这一工具。而对于许多其他较小的设备,这种模式依然存在,日本是其中另一个主要国家。 cutting-edge adj. 前沿的;顶尖的 geopolitical adj. 地缘政治学的 chokepoint n. 阻塞点;瓶颈 leverage n. 杠杆;影响力 | v. 充分利用;举债经营 lithograph n. 石板画 lithography machine 光刻机 leading-edge adj. 顶尖的 access to phrase. 有权使用;有机会接近;能够使用 etch v. 蚀刻;凿刻 foundry n. 铸造厂;铸造车间
These choke-points have made it possible for the American government to enact harsh and effective controls on the export of advanced chips to China. As a result the Chinese are investing hundreds of billions of dollars to create their own chip supply chain. Most analysts believe China is still years behind in this quest, but because of big investments by companies such as Huawei, it has coped with export controls much better than America expected.
这些瓶颈使得 American 政府能够对向 Chinese 出口先进芯片实施严厉且有效的控制。因此,Chinese 正在投资数千亿美元来创建自己的芯片供应链。大多数分析人士认为,Chinese 在这方面仍然落后数年,但由于华为等公司的大量投资,Chinese 应对出口管制的效果比 America 预期的要好得多。 enact v. 制定(法律); 通过(法律); 扮演(角色) quest n. 探索;追求 | v. 寻找;进行探求
America is investing, too. tsmc, seen as a potential prize or casualty if China decided to invade Taiwan, is spending about $65bn on fabs in Arizona, with about $6.6bn in subsidies. Other countries, from India ($10bn) to Germany ($16bn) to Japan ($26bn) are increasing their own investments. The days in which acquiring ai chips has been one of ai’s biggest limiting factors may be numbered. ■
TSMC 被视为 China 若决定入侵 Taiwan 时的潜在战利品或牺牲品,所以 America 也在进行投资,在 Arizona 的工厂上投入约650亿美元,其中约66亿美元为补贴。其他国家,从 India(100亿美元)到 Germany(160亿美元)再到 Japan(260亿美元),也在增加自己的投资。AI 芯片的获取还是 AI 发展的最大限制因素之一的日子可能屈指可数。■ casualty n. 受伤的人;伤亡(人数);事故;急症室 invade v. 侵犯;侵略 fab adj. 绝妙的;极好的 subsidise v. 以津贴补助
- 作者:Kechang
- 链接:https://kechang.uk/article/e690e1ef-8399-4f62-a63f-56490fd01031
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。
相关文章
America’s election is mired in conflict | 美国选举陷入纷争
Kamala Harris makes Donald Trump look out of his depth | 卡玛拉·哈里斯让唐纳德·特朗普显得力不从心
The real problem with China’s economy 中国经济的真正问题
AI firms will soon exhaust most of the internet’s data AI 公司将很快用尽互联网中的数据
A short history of AI 一则简短的 AI 历史
“Black Myth: Wukong” is China’s first blockbuster video game 中国首部重磅游戏:“黑神话:悟空”