ColabFold icon indicating copy to clipboard operation
ColabFold copied to clipboard

--msa-mode options without brackets and spaces?

Open Ni-Ar opened this issue 2 years ago • 6 comments

Context

I installed ColabFold in a conda environment that I activate before launching colab_batch on the HPC of my institute that has GPUs available. I followed the instructions to create a conda environment following more or less these steps from localcolabfold, but slightly differently. I created the environment today so I'm using the latest version.

When doing

conda activate colabfold
colabfold_batch --help

It all works well and returns:

usage: colabfold_batch [-h] [--stop-at-score STOP_AT_SCORE]
                       [--num-recycle NUM_RECYCLE] [--num-models {1,2,3,4,5}]
                       [--recompile-padding RECOMPILE_PADDING]
                       [--model-order MODEL_ORDER] [--host-url HOST_URL]
                       [--data DATA]
                       [--msa-mode {MMseqs2 UniRef+Environmental),MMseqs2 (UniRef only,single_sequence}]
                       [--model-type {auto,AlphaFold2-ptm,AlphaFold2-multimer}]
                       [--amber] [--templates] [--env] [--cpu]
                       [--rank {auto,plddt,ptmscore}]
                       [--pair-mode {unpaired,paired,unpaired+paired}]
                       [--recompile-all-models]
                       [--sort-queries-by {none,length,random}] [--zip]
                       [--overwrite-existing-results]
                       input results

positional arguments:
  input                 Can be one of the following: Directory with fasta/a3m
                        files, a csv/tsv file, a fasta file or an a3m file
  results               Directory to write the results to

optional arguments:
  -h, --help            show this help message and exit
  --stop-at-score STOP_AT_SCORE
.....

So I believe the installation worked well.

To launch colabfold on my institute cluster I use the qsub. I wrote a small bash script that exports some needed variables and package the inputs and outputs to fit my folder structure to integrate the results in my current project.

If I do something like these as a minimal test to submit a job I do:

qsub -q ${QUEUE_NAME} -V -pe smp ${Num_Processes} -l gpu=${Num_GPUs} \
         -l virtual_free=${Num_Ram}M,h_vmem=${Num_Ram}M,h_rt=${Num_Hours}:59:00 \
         -b y \
         colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1

it all works well.

Issue

The problem arise when I try to specify --msa-mode. In bash using parentheses and spaces just makes thing more annoying. So specifying --msa-mode MMseqs2 (UniRef+Environmental) or --msa-mode MMseqs2 \(UniRef+Environmental\) doesn't really work cause bash is too stupid compared to the python notebooks on google colab. The error message is just:

syntax error near unexpected token `('
`colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1 --msa-mode MMseqs2 (UniRef+Environmental)'

Also I'm not really sure there's really a solution. My google foo found only this, which has no solution to my silly problem. I also tried combinations of quotes without success.

So would it be possible to change the options of --msa-mode to MMseqs2_UniRef+Environmental or MMseqs2_UniRef to make it more similar to single_sequence?

Alternatively, I'm also okay with doing some hacking of the batch.py script if you think I just need to change few lines of code for the parameters/flag specification. (Or other scripts where the flags options are parsed).

Thanks a lot, Nicco

Ni-Ar avatar Jan 11 '22 15:01 Ni-Ar

Could you try colabfold_batch input output --msa-mode="MMseqs2 (UniRef+Environmental)"? In my experience that gets forwarded properly

konstin avatar Jan 12 '22 00:01 konstin

Hi,

thanks for the suggestion, but it still failed:

qsub -q ${QUEUE_NAME} -V -pe smp ${Num_Processes} -l gpu=${Num_GPUs} \
         -l virtual_free=${Num_Ram}G,h_vmem=${Num_Ram}G,h_rt=${Num_Hours}:59:00 \
         -b y \
         colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1 --msa-mode="MMseqs2 (UniRef+Environmental)"

gave this error

syntax error near unexpected token `('

Ni-Ar avatar Jan 12 '22 12:01 Ni-Ar

Probably the best idea would be to submit the job through a script. Alternative the following might work (not tested):

qsub -q ${QUEUE_NAME} -V -pe smp ${Num_Processes} -l gpu=${Num_GPUs} \
         -l virtual_free=${Num_Ram}G,h_vmem=${Num_Ram}G,h_rt=${Num_Hours}:59:00 \
         -b y \
         -- colabfold_batch ${INPUT_DIR} ${OUTPUT_DIR} --num-recycle 2 --num-models 1 --msa-mode="MMseqs2 (UniRef+Environmental)"

martin-steinegger avatar Jan 13 '22 16:01 martin-steinegger

Hi Martin,

I do submit the script as a bash script where qsub is the last command.

bash send_job_with_qsub.sh

and I get that error message with any combination of quotations.

Ni-Ar avatar Jan 13 '22 16:01 Ni-Ar

@martin-steinegger I'm having the exact same problem. No combination of quotes and escaping works: colabfold_batch: error: argument --msa-mode: invalid choice: 'MMseqs2' (choose from 'MMseqs2 (UniRef+Environmental)', 'MMseqs2 (UniRef only)', 'single_sequence')

Input is /home/aljubetic/AF2/CF2/bin/colabfold_batch --num-recycle 6 --msa-mode="MMseqs2 (UniRef+Environmental)" --model-type AlphaFold2-ptm seq.fasta .

I've tried all possible combinations of escaping space and () and quoting.

ajasja avatar Mar 25 '22 12:03 ajasja

Was this issue fixed with the new release v1.3?

I believe PR #148 by YoshitakaMo (that was closed?) was gonna solve this issue. Also PR #150 is also probably going to fix it.

Ni-Ar avatar Jul 08 '22 11:07 Ni-Ar

Has this issue been solved? I found this page yesterday when I searched the same error: syntax error near unexpected token `('. I got this error only if I execute a script from sh on the HPC. However the same script was running on the command line bash on the login node. The suggestion "Change the shebang (the #!/bin/sh part) to #!/bin/bash" from stackexchange fixed my problem: https://unix.stackexchange.com/questions/143753/bash-syntax-error-near-unexpected-token. Maybe this is helpful here as well.

ChristianRohde avatar Aug 01 '23 09:08 ChristianRohde

Yes, newer localcolabfold versions allow you to specify the strings as one of:

mmseqs2_uniref_env
mmseqs2_uniref
single_sequence

milot-mirdita avatar Aug 01 '23 09:08 milot-mirdita

That's great, thanks!

Ni-Ar avatar Aug 01 '23 09:08 Ni-Ar