Graphormer icon indicating copy to clipboard operation
Graphormer copied to clipboard

Feature representations for new Proteins in DiG

Open sai-advaith opened this issue 10 months ago • 12 comments

Hi,

This is regarding protein generation in DiG.

I wanted to know how you obtained the features present in the protein pickle files. As per Appendix B.1 of the paper, the single and pair representations are simply outputs of a pre-trained Evoformer model from AlphaFold given the corresponding protein's Fasta sequence and MSAs.

I set up OpenFold on our systems and saved the representations from Evoformer in a pickle file for the corresponding protein. I used the single and pair keys in the output dictionary in this link. Also, to get the MSAs for the fasta sequence I queried the ColabFold server.

Unfortunately, the representations I received from OpenFold's Evoformer and the representations in the dataset's pickle file were quite different.

Can you please let me know the exact method you used to obtain the single and pair representations for the respective protein fasta sequence?

sai-advaith avatar Apr 23 '24 10:04 sai-advaith

Please use AlphaFold's representations.

zhengsx avatar May 27 '24 05:05 zhengsx

@sai-advaith Hi I assume you downloaded the datasets and checkpoints successfully, the token expired in May because of Microsoft policy. I wonder would you mind share what you have downloaded? Thanks very much!

LifeWorks avatar Jul 30 '24 00:07 LifeWorks

Same!!! @sai-advaith please share!!! Or @LifeWorks do you have it?

amelie-iska avatar Aug 02 '24 00:08 amelie-iska

I wrote a script (based on AlphaFlow) to extract Evoformer representations. This code will help you get the single and pair representations you'll need to run graphormer.

https://github.com/sai-advaith/evoformer_representation

Is this what you wanted @LifeWorks @amelie-iska ? (Feel free to star if it's relevant and let me know if you have any trouble running it)

sai-advaith avatar Aug 02 '24 08:08 sai-advaith

I wrote a script (based on AlphaFlow) to extract Evoformer representations. This code will help you get the single and pair representations you'll need to run graphormer.

https://github.com/sai-advaith/evoformer_representation

Is this what you wanted @LifeWorks @amelie-iska ? (Feel free to star if it's relevant and let me know if you have any trouble running it)

Thanks for the prompt reply.

I wanted to get the checkpoints and dataset used by DiG to predict the distributions: https://github.com/microsoft/Graphormer/blob/main/distributional_graphormer/README.md in DiG's readme, they give a SAS token to download their DiG's trained model, but the token expired and the author didn't put any new share links yet.

I wonder did you happen to download all these datasets and checkpoints before the token expired? If so, would you mind kindly reshare the dataset and checkpoints through google share or something?

https://github.com/microsoft/Graphormer/tree/main/distributional_graphormer/protein#trained-parameters

Thanks very much!

LifeWorks avatar Aug 02 '24 17:08 LifeWorks

@LifeWorks and @sai-advaith if either of you have the datasets and checkpoints, please let me know. I think @sai-advaith has a very useful repo, but it's unclear to me at the moment if this is enough for running DiG. I think we need the dataset too no? And the checkpoint isn't available now too? 😢 Let me know if either of you have time to discuss how to get DiG running. I had it running a couple of months ago before they took down the datasets and checkpoints.

amelie-iska avatar Aug 02 '24 18:08 amelie-iska

The dataset consisted of protein fasta sequence (which you can get online) and evoformer representation (from the repo I shared).

I will get back to you regarding the model weights.

sai-advaith avatar Aug 02 '24 19:08 sai-advaith

The dataset consisted of protein fasta sequence (which you can get online) and evoformer representation (from the repo I shared).

I will get back to you regarding the model weights.

I see. Thanks very much! I'm looking forward to the model weights!

LifeWorks avatar Aug 02 '24 19:08 LifeWorks

Thanks so much @sai-advaith and @LifeWorks! I really appreciate the help getting the weights (and the excellent repo for getting the single and pair representations from EvoFormer)! I'd like the protein only weights, but also the protein-ligand weights if you have them or if either of you are able to get them. Please let me know how you would like to share the weights too.

amelie-iska avatar Aug 02 '24 19:08 amelie-iska

The model weights and data are still private, would anyone (@sai-advaith, @LifeWorks, @amelie-iska) be able to kindly share them with us?

pujaltes avatar Aug 09 '24 18:08 pujaltes

I wish I had them @pujaltes. If you get them, please let me know. I still don't have them.

amelie-iska avatar Aug 15 '24 00:08 amelie-iska