At its I/O conference tomorrow Google will unveil a preview of Google Cloud’s latest machine-learning clusters, which not only aim for nine exaflops of peak performance, but do it using 90% carbon-free energy. It will be the world’s largest publicly available machine learning hub.
At the heart of the new clusters is the TPU V4 Pod. These tensor processing units were announced at Google I/O last year, and AI teams from the likes of Meta, LG, and Salesforce have already had access to the pods. The V4 TPUs allow researchers to use the framework of their choice, whether Tensorflow, JAX, or PyTorch, and have already enabled breakthroughs at Google Research in areas such as language understanding, computer vision, and speech recognition.
Based in Google’s Oklahoma data center, potential workloads for the clusters are expected to be similar, chewing through data in the fields of natural language processing, computer vision algorithms, and recommendation systems.
Access to the clusters is offered in slices, ranging from four chips (one TPU VM) all the way up to thousands of them. Slices with at least 64 chips utilize three-dimensional torus links, providing higher bandwidth for collective communication operations. The V4 chips are also capable of accessing twice as much memory as the previous generation — 32GiB up from 16 — and double the acceleration speed when training large-scale models.
“In order to make advanced AI hardware more accessible, a few years ago we launched the TPU Research Cloud (TRC) program that has provided access at no charge to TPUs to thousands of ML enthusiasts around the world,” said Jeff Dean, SVP, Google Research and AI. “They have published hundreds of papers and open-source github libraries on topics ranging from ‘Writing Persian poetry with AI’ to ‘Discriminating between sleep and exercise-induced fatigue using computer vision and behavioral genetics’. The Cloud TPU v4 launch is a major milestone for both Google Research and our TRC program, and we are very excited about our long-term collaboration with ML developers around the world to use AI for good.”
Google’s sustainability commitment means that the company has been matching its data centers’ energy usage with venerable energy purchases since 2017, and by 2030 aims to run its entire business on renewable energy. The V4 TPU is also more energy efficient than previous generations, producing three times the FLOPS per Watt of the V3 chip.
Access to Cloud TPU v4 Pods comes in evaluation (on-demand), preemptible, and committed use discount (CUD) options, and is being offered to all Google AI Cloud users.