datasets
datasets copied to clipboard
"Total" sequence length doesn't include organelles
Before opening an issue, please:
- [x] Make sure you are using the latest version using
datasets --version - [x] Review our documentation
Describe the bug
Hello NCBI !
The assembly GCA_964199945.1 is reported as having a "Total Sequence Length" of 1,327,610,284 bp, but the the Fasta file actually contains 1,328,070,353 bp. The difference is exactly the MT and the plastid.
| assmstats-total-sequence-len | Assembly Stats Total Sequence Length |
|---|
To Reproduce
$ datasets summary genome accession GCA_964199945.1 --as-json-lines | dataformat tsv genome --fields assmstats-total-sequence-len --elide-header
1327610284
Expected behavior
I would expect the "total" sequence length to include everything. I would otherwise call it the length of "nuclear" genome only.
Best regards, Matthieu
Hi muffato
Thank you for highlighting this issue. I agree that it could be clearer, and we’ll work on improving it.
Nuala