alphafold Alphafold arguments

Alphafold arguments

Open pcuniasse opened this issue 2 years ago • 1 comments

Dear Sir,

We are trying to install a version of AlphaFold2 on our HPC center where we have access to a large number of GPUs.

For Security reasons, there is no possibility to use Docker or Singularity, but a specifically designed container system (PCOCC) for our HPC center (TGCC).

We succeeded in building a container for AlphaFold2 with this system. We launched basic monomer calculations and could check that the results are OK as compared to the notebook.

We have seen that it is possible to set some AF2 arguments (The path of the databases, etc…) in a shell script (attached) that we adapted to our container system.

For instance, we have added the –run_relax=False argument and checked that it was correctly taken into account.

We deduced (likely wrongly) that the list of arguments described in the main script “run_alphafold.py”

flags.DEFINE_boolean('benchmark', False, 'Run multiple JAX model evaluations '
                     'to obtain a timing that excludes the compilation time, '
                     'which should be more indicative of the time required for '
                     'inferencing many proteins.')

flags.DEFINE_integer('random_seed', None, 'The random seed for the data '
                     'pipeline. By default, this is randomly generated. Note '
                     'that even if this is set, Alphafold may still not be '
                     'deterministic, because processes like GPU inference are '
                     'nondeterministic.')

flags.DEFINE_integer('num_multimer_predictions_per_model', 5, 'How many '
                     'predictions (each with a different random seed) will be '
                     'generated per model. E.g. if this is 2 and there are 5 '
                     'models then there will be 10 predictions per input. '
                     'Note: this FLAG only applies if model_preset=multimer')

flags.DEFINE_boolean('use_precomputed_msas', False, 'Whether to read MSAs that '
                     'have been written to disk instead of running the MSA '
                     'tools. The MSA files are looked up in the output '
                     'directory, so it must stay the same between multiple '
                     'runs that are to reuse the MSAs. WARNING: This will not '
                     'check if the sequence, database or configuration have '
                     'changed.')

flags.DEFINE_boolean('run_relax', True, 'Whether to run the final relaxation '
                     'step on the predicted models. Turning relax off might '
                     'result in predictions with distracting stereochemical '
                     'violations but might help in case you are having issues '
                     'with the relaxation stage.')

flags.DEFINE_boolean('use_gpu_relax', None, 'Whether to relax on GPU. '
                     'Relax on GPU can be much faster than CPU, so it is '
                     'recommended to enable if possible. GPUs must be available'
                     ' if this setting is enabled.')

could be set in the shell script for instance via the line :

command_args="$myargs --use_precomputed_msas=True --run_relax=False --use_gpu_relax=False --output_dir=$ALPHAFOLD_OUTPUT --model_preset=${model_preset} --max_template_date=${max_template_date} --benchmark=0 --logtostderr"

It worked correctly for --run_relax=False, but not for --use_precomputed_msas=True. And we do not succeed in obtaining the code read read a precomputed msas in place of doing the alignments by itself.

Have you any suggestion/documentation to set these arguments via a modification of the shell script attached ?

Thanks in advance.

Best regards.

Philippe Cuniasse.

Jun 03 '22 13:06 pcuniasse

"--use_precomputed_msas=True " argument works for me but I had to make sure 1). the output directory is very exact: exact directory name as the fasta file as would have been created de novo, exactly under a subdirectory "msas" and exact chain sequence letter subdirectories, e.g. A for the first specified in fasta, B for the second, etc, skip letter correctly if one of these is - just look at a "msas" subdirectory created de novo and follow the directory structure exactly without deviation; 3). ideally created when you run a prior multimer parameter, as reusing precomputed files from monomer only give you 4 and skips one of the 5 files that actually takes a long time to run, thus take away the advantage of precompute anyway. And I think the command arg is taken from run_docker.py within docker/, at least that's what it seems to me.

Jun 03 '22 14:06 davidyanglee

alphafold alphafold copied to clipboard

Alphafold arguments

alphafold
alphafold copied to clipboard