Use case 1: Building a protein disorder prediction method

usecaseone

Problem

We would like to build a protein disorder prediction method based on AlphaFold2 structures, utilizing accessibility and pLDDT value of residues. To fine tune parameters we prepared a dataset of disordered regions from DisProt and also prepared a dataset of ordered monomeric proteins from PDB that we mapped to UniProt, using SIFTS. We don’t have the capacity to run AlphaFold2 on our computers, so we use AlphaFold Database to download precomputed AlphaFold2 structures.

Initial dataset

We have a list of UniProt ACs containing disordered and ordered regions/proteins.

?
I downloaded structures from AlphaFold Database. Can I use them to test my hypothesis?
!
Use structures from the latest bechmark version, with full length proteins after the cutoff date 01.03.2021.

Proposed protocol with BETA

  1. AlphaFold database use AlphaFold version trained on structures released before 30.04.2018 [ref], but also utilize template structure before 15.02.2021.
  2. Go to BETA download page and select the benchmark version. We have UniProt AC lists, so for the benchmark set we need full length proteins. For the cutoff date we select the closest date after training and template library release, that is 01.03.2021.
  3. The intersection of the original benchmark set (steps 1) and the filtered set from BETA (step 2) should be used to fine tune cutoff values.