mag icon indicating copy to clipboard operation
mag copied to clipboard

Develop full suite of tests for manual execution

Open jfy133 opened this issue 2 years ago • 6 comments
trafficstars

Description of feature

A major problem we currently have during development is our CI tests are nowhere near comprehensive enough due to the pipeline utilising extremely large database files that do not fit in GHA resource allocations.

We should develop and document a suite of manual tests developers should run on their own infrastructure to ensure the pipeline is indeed working as intended.

mag missing configs and tests

For Automated CI

  • [ ] Config one
    • [ ] Direct fastq input
  • [x] Config two
    • [x] #594
  • [ ] ~Config three~

For manual CI

Does not need a database
  • [x] Config four
    • [x] #592
Datbases on AWS
  • [ ] Config five (shared with below)
    • [ ] CAT
    • [ ] GTDB
Databases NOT on AWS
  • [ ] Config five (shared with above)
    • [ ] CheckM (in CI but not in a config)
    • [ ] GUNC
    • [ ] Metaeuk

jfy133 avatar Sep 01 '23 09:09 jfy133

Metaeuk

For MetaEuk, specifying params.metaeuk_mmseqs_db = "UniProtKB/Swiss-Prot" only entails downloading a small database - doing a quick check, the fasta it's based on is only 87Mb. So that should potentially be feasible to run more automatedly?

prototaxites avatar Feb 16 '24 14:02 prototaxites

@prototaxites

Yeah that definitely should be feasible! Is it a single file with a public URL?

jfy133 avatar Feb 16 '24 15:02 jfy133

@prototaxites

Yeah that definitely should be feasible! Is it a single file with a public URL?

"UniProtKB/Swiss-Prot" is the string passed to the mmseqs databases command, which downloads the latest release of the database AFAIK. Now that I think about it, I'm not sure there's a way to specify a version, unfortunately, which limits reproducibility.

Alternative would be to specify the URL of a fasta file to --metaeuk_db - in the MetaEuk module test, I passed it the yeast .faa in the test-data repo: https://github.com/nf-core/modules/blob/master/tests/modules/nf-core/metaeuk/easypredict/main.nf, which seemed to work OK, but it might be better to find a prokaryotic file to use with the test data.

prototaxites avatar Feb 16 '24 16:02 prototaxites