protein_scoring
protein_scoring copied to clipboard
Protein Scoring on Google Colab
Hi Sean,
I've been using the Protein Metrics on Google Colab for the past few days, but for some reason I keep getting errors when I try to calculate scores using ESA-MSA sampler-generated sequences.
The error occurs after I upload a list of sequences and run the Single-sequence metrics.
I'm not too familiar with how codes work on Google Colab, so I was wondering if you would be able to resolve the above issue.
Thanks! Nam
What metrics are you trying to calculate? What errors are you seeing?
- Single sequence metrics should work. Just upload the fasta file of the generated sequences to the "target_seqs" directory.
- For structure metrics, you'd need to generate structures using AlphaFold2 or some other model and then upload those to the metrics Colab. We don't run structure prediction in our notebooks, you'll have to do that with ColabFold, or some other service (or locally on your own hardware).
- For alignment based metrics, you need to upload reference sequences to the "reference_seqs" folder.
I just did a quick test run. Downloaded "sequences.zip" from here: https://zenodo.org/records/10594384
extracted CuSOD_round2_train.fasta, copied the first 6 sequences into a new file called templates.fasta, used those two files in the sequence generation notebook, and then used the output in the metrics notebook. I only did the single sequence and alignment based metrics, because I didn't want to make AlphaFold structures, but everything seemed to work.