Network Management

Cerebras Systems Sets Record for Largest AI Models

Cerebras Systems
Cerebras Systems, the pioneer in high performance artificial intelligence (AI) computing, stated today that it would be feasible to train models with up to 20 billion parameters on a single CS-2 system for the first time—a feat not possible to accomplish by any single system. Cerebras decreases the system engineering time required to run big natural language processing (NLP) models from months to minutes by allowing a single CS-2 to train these models. It also gets rid of one of the most vexing features of NLP: splitting the model among hundreds or thousands of modest graphics processing units (GPU).

“In NLP, bigger models are shown to be more accurate. But traditionally, only a very select few companies had the resources and expertise necessary to do the painstaking work of breaking up these large models and spreading them across hundreds or thousands of graphics processing units. As a result, only very few companies could train large NLP models – it was too expensive, time-consuming and inaccessible for the rest of the industry. Today we are proud to democratize access to GPT-3XL 1.3B, GPT-J 6B, GPT-3 13B and GPT-NeoX 20B, enabling the entire AI ecosystem to set up large models in minutes and train them on a single CS-2,” said Andrew Feldman, CEO and Co-Founder of Cerebras Systems.

“GSK generates extremely large datasets through its genomic and genetic research, and these datasets require new equipment to conduct machine learning. The Cerebras CS-2 is a critical component that allows GSK to train language models using biological datasets at a scale and size previously unattainable. These foundational models form the basis of many of our AI systems and play a vital role in the discovery of transformational medicines,” said Kim Branson, SVP of Artificial Intelligence and Machine Learning at GSK.

These world-first capabilities are facilitated by a combined effect of the Cerebras Wafer Scale Engine-2 (WSE-2size )'s and computational resources, as well as the Weight Streaming software architecture extensions made available with the release of version R1.4 of the Cerebras Software Platform, CSoft.

AI training is simple when a model fits on a single processor. However, when a model has more parameters than memory can hold or a layer demands more computing than a single processor can perform, complexity skyrockets. The model must be divided and distributed over hundreds or thousands of GPUs. This painful procedure might take months to complete. To make matters worse, the procedure is unique to each network compute cluster pair, so the work is not transferable between compute clusters or neural networks. It is really unique.

“Cerebras’ ability to bring large language models to the masses with cost-efficient, easy access opens up an exciting new era in AI. It gives organizations that can’t spend tens of millions an easy and inexpensive on-ramp to major league NLP. It will be interesting to see the new applications and discoveries CS-2 customers make as they train GPT-3 and GPT-J class models on massive datasets,” 

Dan Olds, Chief Research Officer, Intersect360 Research

Spotlight

Other News

Dom Nicastro | April 03, 2020

Read More

Dom Nicastro | April 03, 2020

Read More

Dom Nicastro | April 03, 2020

Read More

Dom Nicastro | April 03, 2020

Read More

Spotlight

Resources