AlphaFold 3: DeepMind evolves its AI protein folding project

DeepMind announced AlphaFold 3, the latest iteration of its protein folding project.

AlphaFold 3, like its predecessors, primarily predicts how proteins fold based on their amino acid sequences.

Proteins comprise long chains of amino acids, and how they fold like ‘origami’ into 3D structures determines their functions.

AlphaFold uses machine learning to simulate the likely 3D structure a protein will adopt through folding.

The “protein folding problem” is fundamental in biochemistry and molecular biology because proteins are essentially the building blocks of all organic life.

Understanding how these structures fold opens the door to deciphering the mechanisms that underpin health and disease on a molecular level.

Proteins can become misfolded, a process that not only disrupts their normal function but also contributes to the development of diseases such as Alzheimer’s and Parkinson’s. Misfolding can interfere with cellular health by accumulating dysfunctional proteins that can damage cells and tissues.

Our understanding of protein misfolding influences a broad spectrum of diseases and biological processes, but this is a long-term scientific challenge.

This is because the number of possible configurations a protein can take is astronomically high, making it computationally intensive to predict the correct structure through brute force methods.

AlphaFold solves this issue of scale using deep learning to predict protein structures.

At its core, it uses neural networks trained on a database of known protein structures to infer the 3D shape of proteins from their amino acid sequences.

Introducing AlphaFold 3

DeepMind recently announced AlphaFold 3, which features an improved version of the Evoformer module, part of the deep learning architecture underpinning AlphaFold 2.

Once the Evoformer module processes input molecules, AlphaFold 3 uses a novel diffusion network to assemble the predicted structures.

This network is similar to those used in AI image generators like DALL-E. It starts with a ‘cloud’ of atoms and iteratively refines the structure over a series of steps until it converges on a final, likely accurate molecular configuration.

The AlphaFold 3 model has evolved beyond proteins alone – it also incorporates information on DNA, RNA, and small molecules and can capture some of their complex interactions.

AlphaFold 3 was trained using Protein Data Bank data. According to DeepMind, it can process over 99% of all known biomolecular complexes in this database.

Isomorphic Labs, who collaborated with DeepMind on the AlphaFold 3 project, is already working with pharmaceutical companies, applying the model to real-world drug design challenges.

DeepMind has also launched the AlphaFold Server, a free and user-friendly platform that allows researchers to harness the power of AlphaFold 3 without extensive computational resources or expertise in machine learning.

A short history of the AlphaFold project

The AlphaFold project started in 2016 and ended in 2018, shortly after AlphaGo’s historic victory against Lee Sedol, a top international Go player.

In 2018, DeepMind debuted AlphaFold 1, the first version of the AI system, at the CASP13 (Critical Assessment of Protein Structure Prediction) challenge.

This biennial competition brings together research groups from around the world to test the accuracy of their protein structure predictions against real experimental data.

AlphaFold 1 placed first in the competition, a massive milestone in computational biology.

Two years later, at CASP14 in 2020, DeepMind presented AlphaFold 2, demonstrating an accuracy so high that the scientific community considered the protein-folding problem essentially solved.

AlphaFold 2’s performance was remarkable. It achieved a median accuracy score of 92.4 GDT (Global Distance Test) across all targets.

To put this into perspective, a score of 90 GDT is considered competitive with results obtained from experimental methods. The AlphaFold 2 methods paper has since received over 20,000 citations, placing it among the top 500 most-cited papers across all scientific fields.

AlphaFold has been instrumental in numerous novel research projects, such as studying proteins that might degrade environmental pollutants, such as plastics, and improving our understanding of uncommon tropical diseases like Leishmaniasis and Chagas.

In July 2021, DeepMind, in partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), released the AlphaFold Protein Structure Database, which provides access to over 350,000 protein structure predictions, including the complete human proteome.

This database has since been expanded to include over 200 million structures, covering nearly all cataloged proteins known to science.

To date, the AlphaFold Protein Structure Database has been accessed by over one million users in over 190 countries, enabling discoveries in fields ranging from medicine to agriculture and beyond.

AlphaFold 3 marks another iteration for this best-in-class protein discovery and analysis system.