manyfold icon indicating copy to clipboard operation
manyfold copied to clipboard

How can I get `tfrecord` data of proteins?

Open eunos-1128 opened this issue 1 year ago • 2 comments

Thank you for your work to help train/fine-tune AF2/OpenFold/pLMFold models.

I tried to run pLMFold's training using my own protein datasets, but couldn't figure out how proteins' tfrecord data can be obtained.

Reading Paper, Supplementary Data and README didn't help me because it has no descriptions in detail about obtaining tfrecord data.

I tried to make use of AF2 modules to get those data. It seems to work but I found that some features written in the paper are missing in features generated by correspondent AF2 codes(template_all_atom_exists and pdb_cluster_size).

How could I obtain necessary features from my own proteins' data to train/fine-tune the model? Is there any tool to do so?

I need your help.

Ref. #7

eunos-1128 avatar Oct 05 '23 08:10 eunos-1128

I have the same problem. Could you give me a help?

lijxgit avatar Oct 07 '23 04:10 lijxgit

And if you used only AF2 modules to collect training data, could you tell me which AF2 modules (functions/methods) you used?

eunos-1128 avatar Oct 10 '23 11:10 eunos-1128