protein_scoring icon indicating copy to clipboard operation
protein_scoring copied to clipboard

Protein Scoring on Google Colab

Open nkim23nyk6 opened this issue 1 year ago • 1 comments

Hi Sean,

I've been using the Protein Metrics on Google Colab for the past few days, but for some reason I keep getting errors when I try to calculate scores using ESA-MSA sampler-generated sequences.

The error occurs after I upload a list of sequences and run the Single-sequence metrics.

I'm not too familiar with how codes work on Google Colab, so I was wondering if you would be able to resolve the above issue.

Thanks! Nam

nkim23nyk6 avatar Feb 17 '24 11:02 nkim23nyk6

What metrics are you trying to calculate? What errors are you seeing?

  • Single sequence metrics should work. Just upload the fasta file of the generated sequences to the "target_seqs" directory.
  • For structure metrics, you'd need to generate structures using AlphaFold2 or some other model and then upload those to the metrics Colab. We don't run structure prediction in our notebooks, you'll have to do that with ColabFold, or some other service (or locally on your own hardware).
  • For alignment based metrics, you need to upload reference sequences to the "reference_seqs" folder.

I just did a quick test run. Downloaded "sequences.zip" from here: https://zenodo.org/records/10594384 extracted CuSOD_round2_train.fasta, copied the first 6 sequences into a new file called templates.fasta, used those two files in the sequence generation notebook, and then used the output in the metrics notebook. I only did the single sequence and alignment based metrics, because I didn't want to make AlphaFold structures, but everything seemed to work.

seanrjohnson avatar Feb 17 '24 16:02 seanrjohnson