- Course overview
- Search within this course
- An introductory guide to AlphaFold’s strengths and limitations
- Validation and impact
- Inputs and outputs
- Accessing and predicting protein structures with AlphaFold 2
- Choosing how to access AlphaFold2
- Accessing predicted protein structures in the AlphaFold Database
- Predicting protein structures with ColabFold and AlphaFold2 Colab
- Predicting protein structures using the AlphaFold2 open-source code
- Other ways to access predicted protein structures
- How to cite AlphaFold
- Advanced modelling and applications of predicted protein structures
- Classifying the effects of missense variants using AlphaMissense
- Summary
- Course slides
- Your feedback
- Glossary of terms
- References
- Acknowledgements
Interpreting results from AlphaFold Server
Alongside predicted structures, AlphaFold 3 supplies a range of confidence metrics, enabling you to assess the accuracy of the predictions. The confidence metrics are similar to those used by AlphaFold 2.
However, because AlphaFold 3 predicts the structures of multimolecular complexes, there are additional factors to consider. AlphaFold 3 is not intended for, validated for, or approved for clinical use.
Outputs provided by AlphaFold Server
AlphaFold Server produces five predictions per job. (Technically, five diffusion samples per seed, but currently each job runs one seed.)
The top-ranked prediction is displayed on the results page. Predicted structures are ranked using the ranking_score metric. This uses two measures of confidence in overall structure (pTM and ipTM), but also includes terms that penalise clashes and encourage disordered regions not to have spurious helices. These extra terms mean ranking_score should only be used to rank structures.
All five samples, along with their associated confidences, are available to download as a zip file. This contains:
- Five .cif files named fold_<job_name>_model_<N>cif, where “<N>” is the rank of the predicted structure. Structures are ranked from 0 to 4, where 0 has the highest confidence. The .cif files contain predicted structures in the mmCIF format. They can be viewed in any molecular viewer like PyMOL or ChimeraX.
- Five .json files named fold_<job_name>_summary_confidences_<N>.json, where “<N>” is the rank of the predicted structure from 0 to 4. These .json files contain summaries of the confidence metrics for the predictions (see below for more details on confidence metrics).
- Five .json files named fold_<job_name>_full_data_<N>.json, where “<N>” is the rank of the predicted structure from 0 to 4. These .json files contain detailed confidence metrics, such as full PAE data, for the predictions (see below for more on confidence metrics).
- A file named fold_<job_name>_job_request.json. This contains the inputs of the modelling job and could be used to re-run the job (for more details, see “Advanced use of AlphaFold Server“).
- A file named terms_of_use.md. This is a legal document detailing the terms of use for the predictions.
JSON is a text-based format, so it is both human- and machine-readable. You can check JSON files with any text editor, or use a programming system like Python to read and visualise outputs.
{
"chain_iptm": [
0.85,
0.86,
0.59,
0.59
],
"chain_pair_iptm": [
[
0.82,
0.9,
0.83,
0.83
],
[
0.9,
0.82,
0.83,
0.84
],
[
0.83,
0.83,
0.03,
0.1
],
[
0.83,
0.84,
0.1,
0.03
]
],
"chain_pair_pae_min": [
[
0.76,
0.79,
1.0,
1.12
],
[
0.79,
0.76,
1.11,
1.0
],
[
0.98,
1.06,
0.78,
0.92
],
[
1.05,
0.97,
0.92,
0.78
]
],
"chain_ptm": [
0.82,
0.82,
0.03,
0.03
],
"fraction_disordered": 0.18,
"has_clash": 0.0,
"iptm": 0.91,
"num_recycles": 10.0,
"ptm": 0.91,
"ranking_score": 1.0
}
Confidence metrics
Some of the metrics in the JSON files are very straightforward: for instance, the “ptm” record contains the overall pTM score. However, some other metrics are more targeted at advanced users. Full explanations of the confidence metrics are provided in subsequent sections.
JSON files with summary outputs contain the following information:
- chain_iptm: A [num_chains] array that gives the average confidence (ipTM) in the interfaces between each chain and all other chains. This can be used for ranking predicted structures for a specific chain, when you care about where the chain binds to the rest of the complex and you do not know which other chains you expect it to interact with. This is often the case with ligands, each of which the system treats as a separate chain.
- chain_pair_iptm: A square [num_chains, num_chains] array representing pairwise ipTM scores. The off-diagonal element (i, j) of the array contains the ipTM restricted to tokens from chains i and j. The diagonal element (i, i) contains the pTM restricted to chain i. The array can be used for ranking predictions of a structure by the accuracy of a specific interface between two chains that you know interact, e.g. antibody-antigen interactions. As these values are calculated based on tokens, this metric also encompasses small molecules and chemically-modified residues and nucleotides.
- chain_pair_pae_min: A square [num_chains, num_chains] array of PAE values. Element (i, j) of the array contains the lowest PAE value across rows restricted to chain i and columns restricted to chain j. This has been found to correlate with whether or not two chains interact, so it can be used to distinguish interacting and non-interacting molecules. As these values are calculated based on tokens, this metric also encompasses small molecules and chemically-modified residues and nucleotides.
- chain_ptm: A [num_chains] array. Element i contains the pTM restricted to chain i. This can be used for ranking the predicted structures of individual chains when you are most interested in the structure of that chain, rather than its cross-chain interactions.
- fraction_disordered: A scalar in the range 0-1 that indicates what fraction of the prediction structure is disordered, as measured by accessible surface area (see Abramson et al., 2024 for details).
- has_clash: A Boolean, i.e. a yes/no value, indicating if the structure has a significant number of clashing atoms (more than 50% of a chain, or a chain with more than 100 clashing atoms).
- iptm: A scalar in the range 0-1 indicating predicted interface TM-score (confidence in the predicted interfaces) for all interfaces in the structure.num_recycles: An integer number that represents the total number of recycles.
- ptm: A scalar in the range 0-1 indicating the predicted TM-score for the full structure.
- ranking_score: A scalar ranging from -100 to 1.5 that can be used for ranking predictions. It combines ptm, iptm, fraction_disordered and has_clash into a single number with the following equation:
0.8 × ipTM + 0.2 × pTM + 0.5 × disorder − 100 × has_clash
JSON files with full outputs contain the following information:
- atom_chain_ids: A [num_atoms] array indicating the chain IDs corresponding to each atom in the prediction.
- atom_plddts: A [num_atoms] array. Element i indicates the predicted local distance difference test (pLDDT) for atom i in the prediction.
- contact_probs: A square [num_tokens, num_tokens] array. Element (i, j) indicates the predicted probability that token i and token j are in contact, where “in contact” is defined as a maximum distance of 8Å between a system-defined representative atom for each token (for details, see Abramson et al., 2024).
- pae: A square [num_tokens, num_tokens] array. Element (i, j) indicates the predicted aligned error (PAE) in the position of token j, when the prediction is aligned to the ground truth using the frame of token i.
- token_chain_ids: A [num_tokens] array indicating the chain IDs corresponding to each token in the prediction.
- token_res_ids: A [num_res] array.
JSON files with full outputs (fold_<job_name>_full_data_<N>.json) can be used with tools like the latest version of ChimeraX or PAE Viewer. In this way, you can visualise dynamic PAE plots and match PAE data onto predicted structures stored in the fold_<job_name>_model_<N>.cif files.