sourmash
sourmash copied to clipboard
WIP: adjust warnings around tax abund and provide v5 upgrades to `tax metagenome`
Add new command line arguments to tax metagenome around abundances. --use-abundances turns on the use of sketch abundances for krona and XXX output formats, while --ignore-abundances/--no-abundances turns them off.
Also provides a warning if no abundances are available for tax metagenome to use, or if --use-abundances/--ignore-abundances is not explicitly set (for sourmash v4).
For sourmash v5, --use-abundances becomes default and it is an error to not have abundances. This can be overridden with --ignore-abundances.
For sourmash v5, this PR also changes the default output format for tax genome and tax metagenome to -F human from -F csv_summary (https://github.com/sourmash-bio/sourmash/issues/2162).
Uses --v4/--v5 command line switches and test infrastructure, ref https://github.com/sourmash-bio/sourmash/issues/3076.
TODO:
- [x] finish writing tests
- [x] test: warning if no abundances provided/used (v4)
- [x] test: no warning if --ignore-abund, and no abundances used (v4/v5)
- [x] test: error if no abundances provided & no ignore (v5)
- [x] test: error if no abundances provided and
--use-abund - [x] test: abundances used if
--use-abund(v4/v5) - [x] test: abundances used if v5
- [x] test: default output format for metagenome is human (v5)
- [x] test: default output format for genome is csv_summary (v4)
- [x] test: abundances not used with
tax genome - [ ] fix @CTB and XXX notes stuff
- [ ] actually test & examine the krona and kreport output formats ;)
- [ ] test: default output format for metagenome is human (v5)
- [ ] test: default output format for genome is csv_summary (v4)
- [ ] add comments to #3577 and #3598
- [ ] add documentation
Fixes https://github.com/sourmash-bio/sourmash/issues/2162 Fixes https://github.com/sourmash-bio/sourmash/issues/3577 Fixes https://github.com/sourmash-bio/sourmash/issues/3598
Codecov Report
:x: Patch coverage is 96.10390% with 3 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 88.19%. Comparing base (c94f143) to head (7ec27ae).
:warning: Report is 54 commits behind head on latest.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| src/sourmash/tax/tax_utils.py | 92.85% | 1 Missing and 1 partial :warning: |
| src/sourmash/tax/__main__.py | 97.56% | 0 Missing and 1 partial :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## latest #3711 +/- ##
==========================================
+ Coverage 88.17% 88.19% +0.01%
==========================================
Files 137 137
Lines 22491 22552 +61
Branches 2281 2298 +17
==========================================
+ Hits 19831 19889 +58
- Misses 2347 2348 +1
- Partials 313 315 +2
| Flag | Coverage Δ | |
|---|---|---|
| hypothesis-py | 25.27% <9.09%> (-0.12%) |
:arrow_down: |
| python | 92.66% <96.10%> (+0.01%) |
:arrow_up: |
| rust | 81.62% <ø> (ø) |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
tax metagenomein v5 (and with--v5) should REQUIRE abundances unless an option (--no-abund?) is provided;tax metagenomein v4 should WARN or ERROR if (a) no abundances are provided (b) abundances are available but not being used
I like it!
See new validation workflow here: https://github.com/sourmash-bio/2025-validate-sourmash-tax-formats.
Ready for review @bluegenes!
This got really complicated from a code and testing perspective. My suggestion for detailed review efforts are to confirm that the various test_metagenome_abund_reporting_* tests match the PR descr and docs... Those tests should be comprehensive for the abundance-related changes in this PR.