您好,欢迎光临Anhui Federal Agricultural Technology Co., Ltd官方网站!
微信公众号 小程序商城 咨询热线:+86 0555-4263188 / +86 18818686777
Anhui Federal Agricultural Technology Co., Ltd Anhui Federal Agricultural Technology Co., Ltd
  • Language
  • 网站首页
  • About Us
  • Products
  • News
  • Messages
  • Contacts
  • News

    A crack in the Nvidia Empire

    发布时间: 2023-12-19    浏览次数:160

    image.png

    In 2012, two major events occurred in the AI industry. In chronological order, the first one was the long established Google Brain, which released its debut work - a deep learning network called "Google Cat" that can recognize cats. The recognition accuracy was 74.8%, which is 0.8% higher than the 74% winning algorithm of the well-known recognition image competition ImageNet the previous year.

    But Google's highlights only lasted for a few months. In December 2012, the winner of the latest ImageNet was announced, and deep learning guru Hinton and his disciples brought the convolutional neural network AlexNet to improve recognition accuracy to 84%, ushering in the next decade of AI revolution. Google Cat was buried in the dust of history.

    Hinton and two students, 2012

    What shocked the industry was not just the ImageNet model itself. This neural network requires 14 million images and a total of 26.2 billion floating-point operations for training. During the one week training process, only four Nvidia Geforce GTX 580s were used. As a reference, Google Cat used 10 million images, 16000 CPUs, and 1000 computers.

    Rumors have it that Google also secretly participated in the competition this year, and the shock it received was directly reflected in its subsequent actions: Google spent a whopping $44 million to acquire the Hinton team, and immediately placed a large number of GPUs with Nvidia for artificial intelligence training, while also sweeping away giants such as Microsoft and Facebook.

    Nvidia became the biggest winner, with its stock price rising as high as 121 times in the next 10 years. An empire was born.

    But over the empire, two dark clouds gradually gathered. Google, which was once scanning Nvidia, made a stunning debut with AlphaGo three years later and defeated human champion Ke Jie in 2017. A keen person discovered that the chip driving AlphaGo is no longer Nvidia's GPU, but Google's self-developed TPU chip.

    In three years, a similar plot will repeat. Tesla, which was once regarded as a benchmark customer by Huang Renxun, also bid farewell to NVIDIA GPUs. First, it launched FSD car chips with NPUs as the core, and then took out the D1 chip used to build AI training clusters - which means NVIDIA has lost two of its most important customers in the AI era.

    In 2022, as the global IT cycle enters a downturn, cloud computing giants are cutting their GPU procurement budgets for data centers, and the trend of blockchain mining is gradually cooling down. In addition, the US chip ban on China has prevented the sale of high-end graphics cards such as A100/H100 to the domestic market, causing Nvidia's inventory to skyrocket and its stock price to drop by two-thirds from its peak.

    At the end of 2022, ChatGPT emerged, and GPU, as the fuel for the big model "alchemy", was once again sought after. Nvidia breathed a sigh of relief, but a third dark cloud followed: on April 18, 2023, the famous technology media The Information revealed that Microsoft, the initiator of this wave of AI, is secretly developing its own AI chip.

    This chip called Athena is manufactured by TSMC using advanced 5nm technology, and the Microsoft R&D team has nearly 300 members. It is obvious that the goal of this chip is to replace the expensive A100/H100, provide OpenAI with a computing engine, and ultimately compete for Nvidia's cake through Microsoft's Azure cloud services.

    Microsoft is currently the largest purchaser of Nvidia H100, and there were even rumors that they wanted to "cover" the annual production capacity of H100. The breakup signal from Microsoft is undoubtedly a bolt from the blue. It should be noted that even in Intel's darkest days, none of its customers dared to make their own CPU chips (except for Apple, which does not sell them to the public).

    Although Nvidia currently monopolizes 90% of the AI computing market with GPU+NVlink+CUDA, the empire has already seen its first crack.

    01

    GPUs that were not originally designed for AI

    From the beginning, GPUs were not born for AI.

    In October 1999, NVIDIA released the GeForce 256, a graphics processing chip based on TSMC's 220 nanometer process and integrating 23 million transistors. Nvidia extracted the initial letter "GPU" from Graphics Processing Unit and named GeForce 256 "the world's first GPU", cleverly defining the new category of GPU and occupying the user's mind to this day.

    At this time, artificial intelligence had been dormant for many years, especially in the field of deep neural networks. Future Turing Award winners such as Geoffrey Hinton and Yann LeCun were still sitting on the academic bench, never thinking that their careers would be completely changed by a GPU originally developed for gamers.

    Who was the GPU born for? Image. More precisely, it was born to liberate the CPU from the laborious task of image display. The basic principle of image display is to divide each frame of the image into individual pixels, and then perform multiple rendering processes such as vertex processing, pixel processing, rasterization, fragment processing, pixel operation, etc., to finally display it on the screen.

    The processing process from pixels to images Image source: graphics supplement

    Why is this called hard work? Do a simple arithmetic question:

    Assuming there are 300000 pixels on the screen, calculated at a frame rate of 60fps, it requires 18 million rendering operations per second, each of which includes the above five steps and corresponds to five instructions. In other words, the CPU needs to complete 90 million instructions per second to achieve one second of image presentation. As a reference, Intel's highest performing CPU at that time only had 60 million processing times per second.

    It's not that the CPU is weak, but rather it is known for its thread scheduling, which means more space is allocated to control and storage units, and the computing units used for computation only occupy 20% of the space. On the other hand, GPUs are the opposite, with over 80% of the space being computing units, providing strong parallel computing capabilities and making them more suitable for tasks such as image display, where the steps are fixed, repetitive, and tedious.

    The internal structure of the CPU and GPU, with the green part representing the computing unit

    It was not until a few years later that some artificial intelligence scholars realized that GPUs with such characteristics were also suitable for deep learning training. Many classic deep neural network architectures have been proposed as early as the second half of the 20th century, but due to the lack of computing hardware to train them, many studies can only talk on paper and their development has been stagnant for a long time.

    In October 1999, a gunshot sent a GPU to artificial intelligence. The training process of deep learning is to perform layered operations on each input value based on the functions and parameters of each layer of the neural network, and ultimately obtain an output value. Similar to graphic rendering, it requires a lot of matrix operations - which happens to be what GPUs are best at.

    A typical deep neural network architecture; Image source: awards data science

    However, although image display requires a large amount of data processing, most of the steps are fixed. Once deep neural networks are applied to the decision-making field, they will involve complex situations such as branch structures, and the parameters of each layer need to be continuously adjusted based on massive data feedback training. These differences pose hidden dangers for the adaptability of GPUs to AI in the future.

    Kumar Chellapilla, the current general manager of Amazon AI/ML, is one of the earliest scholars to have tasted GPU crabs. In 2006, he first implemented Convolutional Neural Networks (CNN) using Nvidia's GeForce 7800 graphics card and found that it was four times faster than using a CPU. This is the earliest known attempt to use GPUs for deep learning [3].

    Kumar Chellapilla and Nvidia Geforce 7800

    Kumar's work did not receive widespread attention, mainly due to the high complexity of programming based on GPUs. But at this time, Nvidia launched the CUDA platform in 2007, which greatly reduced the difficulty for developers to train deep neural networks using GPUs, giving deep learning enthusiasts more hope.

    Subsequently, in 2009, a groundbreaking paper was published by Wu Enda and others at Stanford [6], in which GPUs shortened AI training time from weeks to hours with more than 70 times the computing power of CPUs. This paper points out the direction for the hardware implementation of artificial intelligence. The GPU greatly accelerates the process of AI moving from paper to reality.

    Andrew Ng (Wu Enda)

    It is worth mentioning that Wu Nda joined Google Brain in 2011 and was one of the leaders of the Google Cat project mentioned at the beginning. The reason why Google Brain ultimately failed to use GPUs is unknown to outsiders, but there have been rumors that it was due to Google's unclear attitude towards GPUs since Wu Enda left Google and joined Baidu.

    After countless explorations, the relay baton has finally been handed over to the deep learning master Hinton, and by this time, the time has already pointed to 2012.

    In 2012, Hinton, along with Alex Krizhevsky and Ilya Sutskeverz, designed a deep convolutional neural network called AlexNet and planned to participate in the ImageNet competition that year. But the problem is that training AlexNet with CPU may take several months, so they turned their attention to GPUs.

    This GPU, which has been crucial in the development history of deep learning, is the famous "nuclear bomb graphics card" GTX 580. As the flagship product of Nvidia's latest Fermi architecture, the GTX 580 has been packed with 512 CUDA cores (compared to the previous generation with 108). While computing power has leapt, the exaggerated power consumption and heating issues have also earned Nvidia the name "Nuclear Factory".

    A's arsenic, B's honey. Compared to the smoothness of training neural networks with GPUs, the issue of heat dissipation is simply not worth mentioning. The Hinton team successfully completed programming using NVIDIA's CUDA platform, and with the support of two GTX 580 graphics cards, training for 14 million images took only one week. AlexNet successfully won the championship.

    Due to the ImageNet competition and Hinton's personal influence, all artificial intelligence scholars suddenly realized the importance of GPUs.

    Two years later, Google participated in ImageNet with the GoogLeNet model and won the championship with 93% accuracy, using the Nvidia GPU. That year, the number of GPUs used by all participating teams soared to 110. Outside of the competition, GPUs have become a must-have consumer for deep learning, bringing a steady stream of orders to Huang Renxun.

    This helped Nvidia shake off the shadow of a disastrous defeat in the mobile market - after the release of the iPhone in 2007, the cake of smartphone chips rapidly expanded, and Nvidia also tried to get a share of the pie from Samsung, Qualcomm, MediaTek, and others, but the Tegra processor it launched failed due to cooling issues. In the end, it was the artificial intelligence field that was saved by GPUs that gave Nvidia a second growth curve in return.

    But GPUs are not born for training neural networks, and the faster artificial intelligence develops, the more these problems are exposed.

    For example, although there is a significant difference between GPUs and CPUs, both fundamentally follow the von Neumann structure, and storage and computation are separated. The efficiency bottleneck brought about by this separation is that image processing has relatively fixed steps and can be solved through more parallel operations, but it is crucial in neural networks with numerous branch structures.

    Every time a neural network adds a layer or branch, it needs to increase memory access to store data for backtracking, and the time spent on this is inevitable. Especially in the era of large models, the larger the model, the more memory access operations it needs to perform - the final energy consumption on memory access is much higher than that of operations.

    A simple analogy is that a GPU is a muscular man with numerous computing units, but for every instruction received, he has to turn back to flipping through the instruction manual (memory). As the model size and complexity increase, the real time for the man to work is limited, and instead, he is frequently flipping through the manual until he foams at the mouth.

    Memory issues are just one of the many discomforts of GPUs in deep neural network applications. Nvidia was aware of these issues from the beginning and quickly began to "magic modify" GPUs to better adapt to artificial intelligence application scenarios; And AI players who are as insightful as fire are also sneaking through the dark, trying to use the flaws of the GPU to pry open the corners of the Huang Renxun Empire.

    A battle of attack and defense began.

    02

    The Secret War between Google and Nvidia

    Faced with the overwhelming demand for AI computing power and the congenital defects of GPUs, Huang Renxun proposed two sets of solutions, working hand in hand.

    The first set is to continue violent accumulation of computing power along the path of "computing power immortal, boundless magic power". In an era where the demand for AI computing power doubles every 3.5 months, computing power is like the carrot hanging in front of artificial intelligence companies, causing them to curse Huang Renxun's superb knife skills while stealing all of NVIDIA's production capacity like a licking dog.

    The second approach is to gradually address the mismatch between GPUs and artificial intelligence scenarios through improved innovation. These issues include but are not limited to power consumption, memory walls, bandwidth bottlenecks, low precision computing, high-speed connections, specific model optimization... Since 2012, Nvidia has suddenly accelerated the speed of architecture updates.

    After NVIDIA released CUDA, it used a unified architecture to support the two major scenarios of Graphics and Computing. In 2007, the first generation architecture was introduced and named Tesla. This was not because Huang Renxun wanted to please Musk, but to pay tribute to physicist Nikola Tesla (the earliest generation was the Curie architecture).

    Afterwards, each generation of Nvidia GPU architecture was named after a renowned scientist, as shown in the following figure. In each architecture iteration, Nvidia continues to accumulate computing power while improving without causing any physical damage.

    For example, the second generation Fermi architecture in 2011 had a drawback of heat dissipation, while the third generation Kepler architecture in 2012 shifted the overall design approach from high performance to power efficiency, improving heat dissipation issues; In order to solve the problem of "muscle fools" mentioned earlier, the fourth generation Maxwell architecture in 2014 added more logic control circuits internally for precise control.

    In order to adapt to AI scenarios, Nvidia's "magic modified" GPUs are becoming more and more like CPUs to some extent - just as the excellent scheduling ability of CPUs comes at the cost of sacrificing computing power, Nvidia has to restrain itself on the stacking of computing cores. But no matter how much the GPU carries the burden of universality, it is difficult to compete with dedicated chips in AI scenarios.

    Google was the first to challenge Nvidia by purchasing GPUs on a large scale for AI computing.

    After showcasing its muscles with GoogLeNet in 2014, Google no longer publicly participated in machine recognition competitions and conspired to develop AI specific chips. In 2016, Google took the lead with AlphaGo and immediately launched its self-developed AI chip TPU after winning over Li Shishi, catching Nvidia off guard with its new architecture "born for AI".

    TPU stands for Tensor Processing Unit, also known as Tensor Processing Unit in Chinese. If Nvidia's "magic modification" of GPUs is to demolish the east wall and pay the west wall, then TPU is to fundamentally reduce the storage and connection requirements, and maximize the allocation of chip space to computing. Specifically, there are two major methods:

    The first is quantitative technology. Modern computer operations typically use high-precision data, which takes up a lot of memory. However, in fact, most neural network calculations do not require precision to reach 32-bit or 16 bit floating-point calculations. The essence of quantization technology is to approximate 32-bit/16 bit numbers to 8-bit integers, maintain appropriate accuracy, and reduce storage requirements.

    The second is the pulsating array, which is the matrix multiplication array, which is also one of the most crucial differences between TPU and GPU. Simply put, neural network operations require a large amount of matrix operations, and GPUs can only decompose matrix calculations into multiple vector calculations step by step. Each set needs to access memory to save the results of this layer until all vector calculations are completed, and then combine the results of each layer to obtain the output value.

    In TPU, thousands of computing units are directly connected to form a matrix multiplication array, which serves as the computing core and can perform matrix calculations directly. Apart from loading data and functions from the beginning, there is no need to access storage units, greatly reducing access frequency, greatly accelerating the computing speed of TPU, and greatly reducing energy consumption and physical space occupation.

    Comparison of CPU, GPU, and TPU memory access times

    Google's TPU development speed is very fast, taking only 15 months from design, validation, mass production to final deployment in its own data center. After testing, TPU's performance and power consumption in AI scenarios such as CNN, LSTM, and MLP have greatly outperformed Nvidia's GPUs in the same period. The pressure was suddenly all on NVIDIA.

    The feeling of being stabbed in the back by a major client is unpleasant, but Nvidia won't stand and get beaten, and a tug of war begins.

    Five months after Google launched TPU, Nvidia also unveiled a 16nm Pascal architecture. On the one hand, the new architecture introduces the famous NVlink high-speed bidirectional interconnection technology, greatly improving the connection bandwidth; On the one hand, imitating TPU's quantization technology can improve the computational efficiency of neural networks by reducing data accuracy.

    In 2017, Nvidia launched its first architecture specifically designed for deep learning, Volta, which introduced TensorCore for the first time, specifically for matrix operations - although 4 × Multiplication array of 4 with TPU 256 × The 256 pulse array may appear somewhat shabby compared to others, but it is also a compromise made while maintaining flexibility and versatility.

    4x4 matrix operation implemented by TensorCore in NVIDIA V100

    NVIDIA executives declared to clients, "Volta is not an upgrade to Pascal, but a completely new architecture."

    Google is also racing against time. After 2016, TPU has updated three generations within five years. TPUv2 was launched in 2017, TPUv3 was launched in 2018, and TPUv4 was launched in 2021, and data was connected to Nvidia's face. TPU v4 has a computing speed 1.2 to 1.7 times faster than Nvidia's A100, while reducing power consumption by 1.3 to 1.9 times.

    Google does not sell TPU chips to the outside world, while continuing to purchase Nvidia's GPUs in large quantities, which keeps the AI chip competition between the two in a "hidden battle" rather than an "open competition". But after all, Google deployed TPU into its own cloud service system and provided AI computing services to the outside world, which undoubtedly compressed Nvidia's potential market.

    Google CEO Sunday Picha showcases TPU v4 to the public

    While the two are engaged in a secret struggle, progress in the field of artificial intelligence is also advancing rapidly. In 2017, Google proposed a revolutionary Transformer model, and OpenAI subsequently developed GPT-1 based on Transformer. The arms race for large models erupted, and the demand for AI computing power accelerated for the second time since the emergence of AlexNet in 2012.

    After sensing a new trend, Nvidia launched the Hopper architecture in 2022, introducing the Transformer acceleration engine for the first time at the hardware level, claiming to increase the training time of Transformer based large language models by nine times. Based on the Hopper architecture, Nvidia has launched the "strongest surface GPU" - H100.

    H100 is NVIDIA's ultimate "Stitch Monster", which introduces various AI optimization technologies such as quantization, Tensor Core 4.0, and Transformer acceleration engines; On the other hand, it is filled with Nvidia's traditional strengths, such as 7296 CUDA cores, 80GB of HBM2 graphics memory, and up to 900GB/s NVlink 4.0 connectivity technology.

    Holding the H100, Nvidia breathed a sigh of relief for the time being, as there is no mass-produced chip on the market that can play better than the H100.

    The secret tug of war between Google and Nvidia is also a mutual achievement: Nvidia has imported many innovative technologies from Google, and Google's cutting-edge research in artificial intelligence has fully benefited from the innovation of Nvidia's GPUs. Together, the two have reduced AI computing power to a level that big language models can use on tiptoe. Those who are in the limelight, such as OpenAI, also stand on the shoulders of these two individuals.

    But emotions belong to emotions, business belongs to business. The battle of attack and defense around GPUs has made the industry more certain that GPUs are not the optimal solution for AI, and customized dedicated chips (ASICs) have the possibility of breaking Nvidia's monopoly position. The cracks have opened, and naturally, Google is not the only one following the scent.

    Especially with computing power becoming the most definite demand in the AGI era, everyone wants to sit at a table with Nvidia during meals.

    03

    A widening crack

    In addition to OpenAI, there are also two companies that have gained popularity in this round of AI craze. One is the AI drawing company Midjournal, whose ability to handle various art styles has made countless carbon based artists tremble with fear; The other company is Austropic, founded by OpenAI, and its dialogue robot Claude has been playing back and forth with ChatGPT.

    But neither of these companies purchased Nvidia GPUs to build supercomputers, and instead used Google's computing power services.

    In order to welcome the explosion of AI computing power, Google has built a set of supercomputers (TPU v4 Pod) with 4096 TPUs. The chips are interconnected using self-developed optical circuit switches (OCS), which can not only be used to train their own large language models such as LaMDA, MUM, and PaLM, but also provide affordable and high-quality services to AI startups.

    Google TPU v4 Pod supercomputing

    Tesla also excels in DIY calculations. After launching the in car FSD chip, Tesla showcased the supercomputing Dojo ExaPOD built with 3000 of its own D1 chips in August 2021. Among them, the D1 chip is manufactured by TSMC using the 7nm process, and 3000 D1 chips directly make Dojo the fifth largest computing power computer in the world.

    However, both cannot compare to the impact brought by Microsoft's self-developed Athena chip.

    Microsoft is one of Nvidia's largest customers, and its Azure cloud service has purchased at least tens of thousands of A100 and H100 high-end GPUs. In the future, it will not only support ChatGPT's massive conversation consumption, but also supply a series of AI enabled products such as Bing, Microsoft 365, Teams, Github, SwiftKey, etc.

    Upon careful calculation, the Nvidia tax that Microsoft needs to pay is an astronomical number, and self-developed chips are almost inevitable. Just like how Alibaba calculated the future demand for cloud computing, databases, and storage on Taobao and Tmall, and found that it was also an astronomical number, they decisively began to support Alibaba Cloud and launched a vigorous "de IOE" campaign internally.

    Cost savings are one aspect, while vertical integration and differentiation are the other. In the era of mobile phones, Samsung's CPU (AP), memory, and screen are all self-produced and sold, making great contributions to Samsung's global Android dominance. Google and Microsoft's chip manufacturing are also aimed at optimizing their own cloud services at the chip level to create differentiation.

    So, unlike Apple and Samsung, which do not sell chips to the outside world, Google and Microsoft's AI chips, although not sold to the outside world, will use "AI computing power cloud services" to digest a portion of Nvidia's potential customers. Midjournal and Auropic are examples, and in the future, more small companies (especially the AI application layer) will choose cloud services.

    The concentration of the global cloud computing market is high, with over 60% of the top five vendors (Amazon AWS, Microsoft Azure, Google Cloud, Alibaba Cloud, and IBM) producing their own AI chips. Among them, Google has the fastest progress, IBM has the strongest reserves, Microsoft has the greatest impact, Amazon has done the best in confidentiality, and Alibaba has the most difficulties.

    The outcome of Oppo Zheku, a domestically developed chip by a large manufacturer, will cast a shadow over every player who enters. But overseas large companies can use funds to build their own talent and technology supply chains through self research. For example, when Tesla started FSD, they recruited the Silicon Valley god Jim Keller, and when Google developed TPU, they directly invited Professor David Patterson, a Turing Award winner and RISC architecture inventor.

    In addition to large corporations, some small and medium-sized companies are also trying to carve out Nvidia's cake, such as Graphcore, which was once valued at $2.8 billion, and the Cambrian period in China is also among them. The following table lists the well-known start-up AI chip design companies worldwide.

    The difficulty for AI chip startups lies in the lack of strong financial investment from major manufacturers, as well as the inability to produce and sell themselves like Google. Unless the technology path is unique or the advantages are particularly strong, there is basically no chance of winning when facing Nvidia in a short battle. The cost and ecological advantages of the latter can almost erase all customer doubts.

    The impact of Start up on NVIDIA is limited, and Huang Renxun's hidden concerns still lie with those physically dishonest big clients.

    Of course, big companies still cannot do without Nvidia. For example, even though Google's TPU has been updated to the fourth generation, there is still a need to purchase a large number of GPUs to collaborate with TPU to provide computing power; Even with Tesla's performance boasting Dojo supercomputer, Musk still chose to purchase 10000 GPUs from NVIDIA when building a new AI company.

    However, Huang Renxun had already experienced the plastic friendship of large factories in Musk. In 2018, Musk publicly announced that he would develop his own car chip (using NVIDIA's DRIVE PX at the time). Huang Renxun was questioned by analysts on the spot during a conference call and was unable to step down for a while. Afterwards, Musk made a clarification, but a year later Tesla still left Nvidia without looking back.

    Large factories never show mercy in terms of cost savings. In the PC era, although Intel's chips are sold to the B-end, consumers have strong autonomy in their choices, and manufacturers need to promote "Intel Inside"; But in the era of computing power cloud, giants can block all underlying hardware information. In the future, if they also purchase 100TFlops of computing power, can consumers distinguish which part comes from TPU and which part comes from GPU?

    Therefore, NVIDIA ultimately has to face the question: GPUs are not born for AI, but will GPUs be the optimal solution for AI?

    For 17 years, Huang Renxun has separated GPUs from a single game and image processing scene, making them a universal computing tool. The trend of mining has brought mining, and the metaverse has become popular with metaverse and AI, constantly "magic modifying" GPUs for new scenes, trying to find a balance between "universality" and "specificity".

    Reviewing Nvidia's past two decades, it has launched countless new technologies that have changed the industry: CUDA platform, TensorCore, RT Core (ray tracing), NVlink, cuLitho platform (computational lithography), hybrid accuracy, Omniverse, Transformer engine... These technologies have helped Nvidia transform from a second tier chip company to a leading player in the industry's market value, which is truly inspiring.

    But there should be a computing architecture for each generation. The development of artificial intelligence is advancing rapidly, and technological breakthroughs are fast enough to be measured in hours. If we want the penetration of AI into human life to increase significantly like the popularity of PCs/smartphones, then the cost of computing may need to be reduced by 99%. GPU may not be the only answer.

    History tells us that even the most prosperous empire may have to be careful of that inconspicuous crack.

    The entire article is complete. Thank you for reading.

    Regarding the "Silicon based Study Society": a new account under the Yuanchuan Research Institute, it only focuses on three areas of research: artificial intelligence, robotics, and chips. Deep thinking, connecting industries, tracking trends, stay tuned.

    Reference materials

    [1] ImageNet Classification with Deep Convolutional Neural Networks, Hinton

    [2] Microsoft Readies AI Chip as Machine Learning Costs Surge, The Information

    [3] High Performance Convolutional Neural Networks for Document Processing

    [4] Google's Cloud TPU v4 provides exaFLOPS scale ML with industry leading efficiency

    [5] Tesla's AI Ambition, Far East Research Institute

    [6] Large scale Deep Unsupervised Learning using Graphics Processors

    Author: He Luheng/Boss Dai

    Editor: Boss Dai

    Visual design: Shurui

    Responsible editor: Li Motian

    留下您的信息,我们会尽快回复您!