0%

Understanding pathogenicity scores from AlphaMissense

AlphaMissense assigns each missense variant a score between 0 and 1. These indicate the probability of the variant being pathogenic. For example, a score of 0.8 suggests that 8 out of 10 variants with this score are likely to be pathogenic.

The pathogenicity scores are generated by rescaling raw AlphaMissense data using a logistic regression model trained on data from ClinVar. The aim was to generate a single number for ease of interpretation.

To further facilitate interpretation, scores can be divided into three categories:

  • 0 to 0.34: likely benign
  • 0.34 to 0.564: uncertain
  • 0.564 to 1: likely pathogenic

These cutoffs were determined using precision and recall curves to ensure 90% precision for the pathogenic and benign classes. Variants that do not meet this precision are classified as ambiguous.

Figure 32. Precision and recall of AlphaMissense predictions on the ClinVar test set. Blue curves represent precision and recall for variants labeled benign (AlphaMissense score 0, 0.5), while red curves represent variants labeled as pathogenic (AlphaMissense score 0.5, 1). Classification thresholds were set to achieve 90% precision for both benign and pathogenic classifications. Variants not meeting this 90% precision threshold were defined as ambiguous. The dotted line indicates the expected precision, calculated as the average predicted probability of a variant being pathogenic or benign.

Applications of AlphaMissense

AlphaMissense scores are valuable for designing and interpreting experiments related to protein function. Examples include Multiplexed Assays for Variant Effects (MAVEs), which explore the effects of thousands of genetic variants in parallel. 

AlphaMissense can help elucidate the molecular effects of variants on protein function. This can contribute to the discovery of disease-causing genes, improving diagnostic accuracy.

When used in the context of a 3D structure, AlphaMissense can give insight into important functional regions. To enhance its visualisation, scores have been integrated into the AlphaFold Database to assess in the context of predicted structures for the entire human proteome.