Graphormer
Graphormer copied to clipboard
Feature representations for new Proteins in DiG
Hi,
This is regarding protein generation in DiG.
I wanted to know how you obtained the features present in the protein pickle files. As per Appendix B.1 of the paper, the single and pair representations are simply outputs of a pre-trained Evoformer model from AlphaFold given the corresponding protein's Fasta sequence and MSAs.
I set up OpenFold on our systems and saved the representations from Evoformer in a pickle file for the corresponding protein. I used the single
and pair
keys in the output
dictionary in this link. Also, to get the MSAs for the fasta sequence I queried the ColabFold server.
Unfortunately, the representations I received from OpenFold's Evoformer and the representations in the dataset's pickle file were quite different.
Can you please let me know the exact method you used to obtain the single and pair representations for the respective protein fasta sequence?
Please use AlphaFold's representations.
@sai-advaith Hi I assume you downloaded the datasets and checkpoints successfully, the token expired in May because of Microsoft policy. I wonder would you mind share what you have downloaded? Thanks very much!
Same!!! @sai-advaith please share!!! Or @LifeWorks do you have it?
I wrote a script (based on AlphaFlow) to extract Evoformer representations. This code will help you get the single and pair representations you'll need to run graphormer.
https://github.com/sai-advaith/evoformer_representation
Is this what you wanted @LifeWorks @amelie-iska ? (Feel free to star if it's relevant and let me know if you have any trouble running it)
I wrote a script (based on AlphaFlow) to extract Evoformer representations. This code will help you get the single and pair representations you'll need to run graphormer.
https://github.com/sai-advaith/evoformer_representation
Is this what you wanted @LifeWorks @amelie-iska ? (Feel free to star if it's relevant and let me know if you have any trouble running it)
Thanks for the prompt reply.
I wanted to get the checkpoints and dataset used by DiG to predict the distributions: https://github.com/microsoft/Graphormer/blob/main/distributional_graphormer/README.md in DiG's readme, they give a SAS token to download their DiG's trained model, but the token expired and the author didn't put any new share links yet.
I wonder did you happen to download all these datasets and checkpoints before the token expired? If so, would you mind kindly reshare the dataset and checkpoints through google share or something?
https://github.com/microsoft/Graphormer/tree/main/distributional_graphormer/protein#trained-parameters
Thanks very much!
@LifeWorks and @sai-advaith if either of you have the datasets and checkpoints, please let me know. I think @sai-advaith has a very useful repo, but it's unclear to me at the moment if this is enough for running DiG. I think we need the dataset too no? And the checkpoint isn't available now too? 😢 Let me know if either of you have time to discuss how to get DiG running. I had it running a couple of months ago before they took down the datasets and checkpoints.
The dataset consisted of protein fasta sequence (which you can get online) and evoformer representation (from the repo I shared).
I will get back to you regarding the model weights.
The dataset consisted of protein fasta sequence (which you can get online) and evoformer representation (from the repo I shared).
I will get back to you regarding the model weights.
I see. Thanks very much! I'm looking forward to the model weights!
Thanks so much @sai-advaith and @LifeWorks! I really appreciate the help getting the weights (and the excellent repo for getting the single and pair representations from EvoFormer)! I'd like the protein only weights, but also the protein-ligand weights if you have them or if either of you are able to get them. Please let me know how you would like to share the weights too.
The model weights and data are still private, would anyone (@sai-advaith, @LifeWorks, @amelie-iska) be able to kindly share them with us?
I wish I had them @pujaltes. If you get them, please let me know. I still don't have them.