modules icon indicating copy to clipboard operation
modules copied to clipboard

New module: kraken2/build

Open alxndrdiaz opened this issue 2 years ago • 3 comments

PR checklist

Closes #2953

  • [ ] This comment contains a description of changes (with reason).
  • [ ] If you've fixed a bug or added code that should be tested, add tests!
  • [ ] If you've added a new tool - have you followed the module conventions in the contribution docs
  • [ ] If necessary, include test data in your PR.
  • [ ] Remove all TODO statements.
  • [ ] Emit the versions.yml file.
  • [ ] Follow the naming conventions.
  • [ ] Follow the parameters requirements.
  • [ ] Follow the input/output options guidelines.
  • [ ] Add a resource label
  • [ ] Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • [ ] nf-core modules test <MODULE> --profile docker
      • [ ] nf-core modules test <MODULE> --profile singularity
      • [ ] nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • [ ] nf-core subworkflows test <SUBWORKFLOW> --profile conda

alxndrdiaz avatar Mar 19 '24 00:03 alxndrdiaz

The following error is related to expected file names for .dmp taxonomy files:

│   Command error:                                                                                                                                               │
│     Creating sequence ID to taxonomy ID map (step 1)...                                                                                                        │
│     lookup_accession_numbers: expected TAB not found in taxonomy/prot.accession2taxid                                                                          │
│     Found 0/13 targets, searched through 1 accession IDs, search complete.                                                                                     │
│     lookup_accession_numbers: 13/13 accession numbers remain unmapped, see unmapped.txt in DB directory                                                        │
│     Sequence ID to taxonomy ID map complete. [0.009s]                                                                                                          │
│     Estimating required capacity (step 2)...                                                                                                                   │
│     Estimated hash table requirement: 157988 bytes                                                                                                             │
│     Capacity estimation complete. [0.006s]                                                                                                                     │
│     Building database files (step 3)...                                                                                                                        │
│     build_db: error opening taxonomy//nodes.dmp: No such file or directory    

Expected file names are: names.dmp and nodes.dmp. Then I will first try to rename these files in the add module. PR to fix taxonomy file names: #5214

alxndrdiaz avatar Mar 19 '24 02:03 alxndrdiaz

Seems to be starting to work now @alxndrdiaz !

jfy133 avatar Mar 19 '24 08:03 jfy133

Seems to be starting to work now @alxndrdiaz !

It seems to work now. Also assertions need to be improved.

alxndrdiaz avatar Mar 19 '24 14:03 alxndrdiaz

One last (?) problem: there are two files (opts.k2d and unmapped.txt) that seem to change between tests. The following assertion fails if these files are included:

 assertAll(
                { assert process.success },
                { assert process.out.db.get(0).get(1) ==~ ".*/test" },
                { assert snapshot (
                        path("${process.out.db[0][1]}/hash.k2d"),
                        path("${process.out.db[0][1]}/taxo.k2d"),
                        path("${process.out.db[0][1]}/opts.k2d"),
                        path("${process.out.db[0][1]}/unmapped.txt")
                    ).match()
                }
            )

In this case opts.k2d and unmapped.txt had different md5 codes between tests:

│   1 [                                     1 [                                                                                                                 │
│   2     "hash.k2d:md5,e9984a5e98f87c048   2     "hash.k2d:md5,e9984a5e98f87c048                                                                         │
│ 8cb5e7618d5bbe0",                       8cb5e7618d5bbe0",                                                                                         │
│   3     "taxo.k2d:md5,29d65b1796e09191f   3     "taxo.k2d:md5,29d65b1796e09191f                                                                         │
│ d7bdcaa24130459",                       d7bdcaa24130459",                                                                                         │
│ ! 4     "opts.k2d:md5,de7a6df4eb9f322f0 ! 4     "opts.k2d:md5,bbef3355da216a020                                           │
│ 53724a3d05ad8aa",                       ddc1b36db249910",                                                     │
│ ! 5     "unmapped.txt:md5,f6c3f052cfd71 ! 5     "unmapped.txt:md5,1c04243f50ce0                                           │
│ c5cd7133f7f58ddcb52"                    e7769ad7dce51285c7d"                                                  │
│   6 ]      

alxndrdiaz avatar Mar 21 '24 23:03 alxndrdiaz