Sunday December 04, 2022

Meta has created an AI supercomputer that it claims will be the world’s fastest by 2022.

Meta, a social media conglomerate, is the latest tech company that has built an “AI supercomputer”, a high-speed computer specifically designed to train machine learning algorithms. According to the company, its new AI Research SuperCluster (or RSC) is already among the most powerful machines of its kind and will be the world’s most powerful when it is complete in mid-2022. Meta CEO Mark Zuckerberg said that Meta has created what they believe to be the world’s fastest AI supercomputer. It’ll be called RSC for AI Research SuperCluster, and it will be completed later in the year.
This news shows the importance of AI research to companies such as Meta. Nvidia and Microsoft have already announced their “AI supercomputers,” which are slightly more advanced than what we consider regular supercomputers. RSC will be used for training a variety of systems across Meta’s businesses, including content moderation algorithms that detect hate speech on Facebook or Instagram and augmented reality features that will eventually be available in the company’s future AR hardware. Meta also confirms that RSC will be used for designing experiences for the metaverse. This is the company’s insistent branding to create interconnected virtual spaces, from offices to online venues.
“RSC will assist Meta’s AI researchers to build new and more powerful AI models that can learn form trillions of examples; work across hundreds if different languages; seamlessly analyze text and images together; develop new Augmented Reality tools; and much, much more,” wrote Shubho Sengupta and Kevin Lee, Meta engineers.
“We hope RSC can help us build completely new AI systems that, for example can power real-time voice translators to large groups, each speaking a different languages, so they can seamlessly collaborate in a research project or play an AR gaming together.”
Image: Meta’s AI supercomputer will be completed by mid-2022. MetaWork began a year ago with Meta’s engineers designing the various systems (cooling, power, networking and cabling) entirely from scratch. Phase one of RSC is already in operation. It consists of 760 Nvidia GGX 100 systems with 6,080 connected GPUs. This type of processor is particularly adept at solving machine learning problems. Meta claims it has already provided up to 20x better performance for its standard machine vision research tasks.
Phase two of RSC will be completed by 2022. It will contain approximately 16,000 GPUs and be able “to train AI systems with more than a trillion parameters on data set as large as an exabyte.” This raw number of GPUs is not a good indicator of a system’s overall performance. However, Microsoft’s AI supercomputer, built with OpenAI’s research lab, uses 10,000 GPUs.
These numbers are impressive, but it raises the question: What is an AI supercomputer? How does it compare with what we normally think of as supercomputers, which are large machines used by governments and universities to crunch numbers in complex domains such as space, nuclear physics, climate change, and so on?
These two types of systems are known as high-performance computers (or HPCs) and are more alike than they are distinct. Both systems are smaller than individual computers and have a similar appearance to datacenters. They rely on large numbers interconnected processors to exchange data at lightning fast speeds. The Verge is told by Bob Sorensen, an HPC analyst at Hyperion Research, that there are some key differences. Sorensen says that AI-based HPCs are in a different world to traditional HPCs. The big difference is accuracy.
The short explanation is that machine-learning requires less accuracy than tasks given to traditional supercomputers. Therefore, “AI supercomputers” (a bit more recent branding) can perform more calculations per second than their regular brethren who use the same hardware. Meta’s claim of building the “world’s fastest AI computer” is not a direct comparison to other supercomputers that you see in the news. The publishes rankings twice a year.
This is why supercomputers and AI supercomputers use floating-point math. It is a mathematical shorthand that is extremely useful for calculations that use very large and very small numbers. The decimal point is the “floating point”, which “floats” between significant numbers. You can adjust the accuracy of floating-point calculations based on different formats. Most supercomputers calculate their speed using 64-bit floating point operations per second, or FLOPs. AI supercomputers are often measured using 32-bit FLOPs, or even 16-bit FLOPs, as AI calculations require less accuracy. This is why comparing these systems is not always apples to apples. However, this caveat does not diminish the amazing power and capacity of AI supercomputers.
Sorensen adds a word of caution. As is often the case when assessing hardware using the “speeds & feeds” method, top speeds may not always be representative. “HPC vendors often quote performance numbers that indicate how fast their machine can run. Sorensen calls that the theoretical peak performance. “However, the true measure of a system design is its ability to run fast on the jobs it was designed to do.” It is not unusual for HPCs to achieve less that 25 percent when running real-world applications.
The true utility of supercomputers lies in their work, not their theoretical peak performance. Meta’s work involves building moderation systems in a time of low trust and creating a new computing platform that can compete with rivals like Google, Microsoft, or Apple. Although an AI supercomputer gives the company the raw power it needs, Meta still has to devise the winning strategy.

Back to Top
%d bloggers like this: