funannotate icon indicating copy to clipboard operation
funannotate copied to clipboard

What are differences between --augustus-species and --busco_seed_species options of predict subcommand?

Open goshng opened this issue 3 years ago • 1 comments

I am learning the usage of funannotate to annotate a genome with any other evidence such as RNA sequencing data. Reading the document, I found two usages of funannotate predicet to annotate a genome with only a genome sequence file as following:

First, at

https://funannotate.readthedocs.io/en/latest/predict.html#explanation-of-inputs-and-options

funannotate predict -i mygenome.fa -o output_folder -s "Aspergillus nidulans"
    --augustus_species anidulans

Second, at

https://funannotate.readthedocs.io/en/latest/tutorials.html#genome-assembly-only

funannotate predict -i MyAssembly.fa -o fun \
    --species "Pseudogenus specicus" --strain JMP12345 \
    --busco_seed_species botrytis_cinerea --cpus 12

funannotate predicet prints out the two options like this: --augustus_species Augustus species config. Default: uses species name --busco_seed_species Augustus pre-trained species to start BUSCO. Default: anidulans

First, I do not know whether I should use both or either of them. Second, option arguments for both can be chosen from the first colmun of funannotate species output:

$ funannotate species
  Species                                    Augustus               GeneMark   Snap   GlimmerHMM   CodingQuarry   Date
  Conidiobolus_coronatus                     augustus pre-trained   None       None   None         None           2021-09-11
  E_coli_K12                                 augustus pre-trained   None       None   None         None           2021-09-11
  Xipophorus_maculatus                       augustus pre-trained   None       None   None         None           2021-09-11

But, I do not know what the default of --augustus_species (Default: uses species name) means. Could you please explain what any differences of the two options are or point me to where I should look at to learn it in the document available at

https://funannotate.readthedocs.io/en/latest/index.html

Thank you!

goshng avatar Sep 30 '21 08:09 goshng

--busco_seed_species is the species that is used to run BUSCO (otherwise it defaults to anidulans), the BUSCO results are then used to de novo train Augustus. --augustus_species is for specifying a specific pre-trained species to run Augutustus directly, if you specify --augustus_species then it will not train Augustus, but rather just run Augustus with those parameters. If --augustus_species is not set, then the training set is derived from the --species --strain --isolate parameters, ie if you passed:

funannotate predict --species "Aspergillus nidulans" --isolate ABC123

Then the script will turn this into aspergillus_nidulans_ABC123 as the parameter to --augustus_species, since that doesn't exist, it will then run BUSCO and use those results to train Augustus.

nextgenusfs avatar Oct 04 '21 03:10 nextgenusfs