- Problem Description
- Getting the Data
- Read SIFTS Mappings
- Find PDB IDs
- Summarizing the Data
- Visualizing the Data
Problem Description
The Structure Integration with Function, Taxonomy and Sequence (SIFTS) database provides mappings between UniProt and PDB, as well as annotations from GO, InterPro, Pfam, CATH, SCOP, PubMed, Ensembl and other resources. Here, we map all the receptors from the Pharos database to their PDB IDs, using their UniProt accession numbers.
The goal is to obtain a dataset of human targets with available structures and known ligand binding affinities. I also want to get the distribution of these PDB structures across different receptor families, such as Kinases, GPCRs, Ion Channels, Nuclear Receptors, and Transporters.
Getting the Data
First we read in Pharos data csv files downloaded from Pharos for targets in the Tclin (targets with approved drugs), and Tchem (targets with known binding affinities), for several receptor classes. The csv files contain UniProt IDs for each receptor. All downloaded data and code is available in my GitHub repository: ravila4/Pharos-to-PDB
Read SIFTS Mappings
The mappings were downloaded as a CSV file from their ftp site.
SP_PRIMARY | PDB | |
---|---|---|
0 | A0A010 | 5b00;5b01;5b02;5b03;5b0i;5b0j;5b0k;5b0l;5b0m;5... |
1 | A0A011 | 3vk5;3vka;3vkb;3vkc;3vkd |
2 | A0A014C6J9 | 6br7 |
3 | A0A016UNP9 | 2md0 |
4 | A0A023GPI4 | 2m6j |
Find PDB IDs
Here’s a function for joining the two Data Frames:
Adding PDB IDs to Pharos targets:
Summarizing the Data
Number of receptors in each class with at least one structure in the Protein Data Bank:
{'GPCRs': 77,
'ion-channels': 70,
'kinases': 304,
'nuclear-receptors': 41,
'transporters': 15}
Visualizing the Data
Finally, we visualize the results with a pie chart: