Course progress: 0%

How have AlphaFold2’s predictions of protein structure been validated?

AlphaFold2’s capability to predict protein structure was first demonstrated when it triumphed in the CASP14 assessment of structure predictions. Since then it has been validated by multiple lines of evidence from structural biology experiments, including studies of X-ray crystallography, cryogenic electron microscopy and cross-linking mass spectrometry.

AlphaFold’s success in CASP

Critical Assessment of Structure Prediction (CASP) is an experimental test of protein structure predictions. It has been carried out every two years since 1994. The assessment is open to anyone.

CASP entrants submit predicted structures for proteins. The proteins in question have their structures determined by experiment, by X-ray crystallography, nuclear magnetic resonance (NMR), or cryogenic electron microscopy (cryo-EM). However, these structures are not released to the public until assessment is over. Predicted structures are then compared against these experimental structures.

Google DeepMind entered structure predictions from AlphaFold2 into CASP14 in 2020. The software outperformed all the other entrants by a wide margin.

Figure 7. The ten highest scoring entries into CASP14 in 2020, based on their cumulative scores across all proteins attempted. AlphaFold2 was by far the most successful.

Previously, overall structure prediction accuracy, measured by global distance from ground truth (GDT_TS), had only reached about 60. AlphaFold2 scored over 90. This score meant the predicted protein structures closely matched the experimentally-resolved structures. CASP coordinators proclaimed that the protein-folding problem had been “largely solved”, at least for single protein chains.

Google DeepMind previously entered an earlier version of AlphaFold in 2018’s CASP13. It took the first place but by a small margin. Those predictions were not accurate enough, so the protein structure prediction problem was not considered solved.

Figure 8. Overall success at protein structure prediction in CASPs over the years. AlphaFold drove rapid improvements in 2018 and 2020.

Google DeepMind did not directly participate in CASP15 in 2022. However, all the top performers used modified or customised versions of AlphaFold2. Because Google DeepMind released the source code for AlphaFold, other researchers were able to build on it and in some cases outperform the standard version of the software (Elofsson, 2023; Kryshtafovych et al., 2023).

Figure 9. The highest-scoring entries in CASP15 in 2022. All top performers used some version of AlphaFold2 in their predictions.

Subsequent evidence from structural biology

In CASP14, AlphaFold2 succeeded in predicting the structures of dozens of proteins. However, there are millions of proteins in nature. Hence subsequent experimenters have subjected the software to further validation.

Structural biology experiments demonstrate that AlphaFold2 structures (or well-defined parts of the predicted structures, like protein domains) work well as search models for molecular replacement in X-ray crystallography (Barbarin-Bocahu and Graille, 2022; McCoy et al., 2022; Millán et al., 2021). This implies the AlphaFold2 structures closely resemble the protein crystal structures.

AlphaFold2 structures fit well into experimental cryo-EM electron density maps (Chojnowski, 2022; Giri et al., 2023). This again suggests a good match between structure predictions and the experimental data.

AlphaFold2 structures are still a good fit when proteins are in solution, as opposed to crystallised. Using AlphaFold2 models to interpret nuclear magnetic resonance (NMR) data obtained in solution suggested an excellent fit in the vast majority of the cases (Fowler and Williamson, 2022; Tejero et al., 2022). Interestingly, this indicates that AlphaFold2 models are not that biassed towards predicting a crystal state, despite AlphaFold2 mainly having been trained on data derived from protein crystals.

Figure 10. Specialized acyl carrier protein protein

Notably, AlphaFold’s prediction (AlphaFold ID: AF-Q6N882-F1) demonstrates a closer match to the NMR structure (green, PDB ID: 2LPK) than the corresponding X-ray crystal structure (grey, PDB ID: 3LMO) (Tejero et al., 2022)

Cross-linking mass-spectrometry experiments showed that the majority of AlphaFold2 structure predictions were correct for both single protein chains and protein-protein complexes in situ (Bartolec et al., 2023; McCafferty et al., 2023).

Taken together, these data validate AlphaFold2’s accuracy. They also suggest that AlphaFold2 models can be useful for a variety of research applications.

AlphaFold

How have AlphaFold2’s predictions of protein structure been validated?

AlphaFold’s success in CASP

Subsequent evidence from structural biology

Figure 10. Specialized acyl carrier protein protein

Congratulations!