ColabFold
ColabFold copied to clipboard
--msa-mode options without brackets and spaces?
Context
I installed ColabFold in a conda environment that I activate before launching colab_batch
on the HPC of my institute that has GPUs available. I followed the instructions to create a conda environment following more or less these steps from localcolabfold, but slightly differently. I created the environment today so I'm using the latest version.
When doing
conda activate colabfold
colabfold_batch --help
It all works well and returns:
usage: colabfold_batch [-h] [--stop-at-score STOP_AT_SCORE]
[--num-recycle NUM_RECYCLE] [--num-models {1,2,3,4,5}]
[--recompile-padding RECOMPILE_PADDING]
[--model-order MODEL_ORDER] [--host-url HOST_URL]
[--data DATA]
[--msa-mode {MMseqs2 UniRef+Environmental),MMseqs2 (UniRef only,single_sequence}]
[--model-type {auto,AlphaFold2-ptm,AlphaFold2-multimer}]
[--amber] [--templates] [--env] [--cpu]
[--rank {auto,plddt,ptmscore}]
[--pair-mode {unpaired,paired,unpaired+paired}]
[--recompile-all-models]
[--sort-queries-by {none,length,random}] [--zip]
[--overwrite-existing-results]
input results
positional arguments:
input Can be one of the following: Directory with fasta/a3m
files, a csv/tsv file, a fasta file or an a3m file
results Directory to write the results to
optional arguments:
-h, --help show this help message and exit
--stop-at-score STOP_AT_SCORE
.....
So I believe the installation worked well.
To launch colabfold
on my institute cluster I use the qsub
. I wrote a small bash script that exports some needed variables and package the inputs and outputs to fit my folder structure to integrate the results in my current project.
If I do something like these as a minimal test to submit a job I do:
qsub -q ${QUEUE_NAME} -V -pe smp ${Num_Processes} -l gpu=${Num_GPUs} \
-l virtual_free=${Num_Ram}M,h_vmem=${Num_Ram}M,h_rt=${Num_Hours}:59:00 \
-b y \
colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1
it all works well.
Issue
The problem arise when I try to specify --msa-mode
. In bash using parentheses and spaces just makes thing more annoying. So specifying --msa-mode MMseqs2 (UniRef+Environmental)
or --msa-mode MMseqs2 \(UniRef+Environmental\)
doesn't really work cause bash is too stupid compared to the python notebooks on google colab. The error message is just:
syntax error near unexpected token `('
`colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1 --msa-mode MMseqs2 (UniRef+Environmental)'
Also I'm not really sure there's really a solution. My google foo found only this, which has no solution to my silly problem. I also tried combinations of quotes without success.
So would it be possible to change the options of --msa-mode
to MMseqs2_UniRef+Environmental
or MMseqs2_UniRef
to make it more similar to single_sequence
?
Alternatively, I'm also okay with doing some hacking of the batch.py
script if you think I just need to change few lines of code for the parameters/flag specification. (Or other scripts where the flags options are parsed).
Thanks a lot, Nicco
Could you try colabfold_batch input output --msa-mode="MMseqs2 (UniRef+Environmental)"
? In my experience that gets forwarded properly
Hi,
thanks for the suggestion, but it still failed:
qsub -q ${QUEUE_NAME} -V -pe smp ${Num_Processes} -l gpu=${Num_GPUs} \
-l virtual_free=${Num_Ram}G,h_vmem=${Num_Ram}G,h_rt=${Num_Hours}:59:00 \
-b y \
colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1 --msa-mode="MMseqs2 (UniRef+Environmental)"
gave this error
syntax error near unexpected token `('
Probably the best idea would be to submit the job through a script. Alternative the following might work (not tested):
qsub -q ${QUEUE_NAME} -V -pe smp ${Num_Processes} -l gpu=${Num_GPUs} \
-l virtual_free=${Num_Ram}G,h_vmem=${Num_Ram}G,h_rt=${Num_Hours}:59:00 \
-b y \
-- colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1 --msa-mode="MMseqs2 (UniRef+Environmental)"
Hi Martin,
I do submit the script as a bash script where qsub is the last command.
bash send_job_with_qsub.sh
and I get that error message with any combination of quotations.
@martin-steinegger I'm having the exact same problem.
No combination of quotes and escaping works:
colabfold_batch: error: argument --msa-mode: invalid choice: 'MMseqs2' (choose from 'MMseqs2 (UniRef+Environmental)', 'MMseqs2 (UniRef only)', 'single_sequence')
Input is /home/aljubetic/AF2/CF2/bin/colabfold_batch --num-recycle 6 --msa-mode="MMseqs2 (UniRef+Environmental)" --model-type AlphaFold2-ptm seq.fasta .
I've tried all possible combinations of escaping space and () and quoting.
Was this issue fixed with the new release v1.3
?
I believe PR #148 by YoshitakaMo (that was closed?) was gonna solve this issue. Also PR #150 is also probably going to fix it.
Has this issue been solved? I found this page yesterday when I searched the same error: syntax error near unexpected token `('. I got this error only if I execute a script from sh on the HPC. However the same script was running on the command line bash on the login node. The suggestion "Change the shebang (the #!/bin/sh part) to #!/bin/bash" from stackexchange fixed my problem: https://unix.stackexchange.com/questions/143753/bash-syntax-error-near-unexpected-token. Maybe this is helpful here as well.
Yes, newer localcolabfold versions allow you to specify the strings as one of:
mmseqs2_uniref_env
mmseqs2_uniref
single_sequence
That's great, thanks!