datasets icon indicating copy to clipboard operation
datasets copied to clipboard

Limited number of genome downloads for some taxon

Open mkdevesh opened this issue 3 months ago • 0 comments

I am trying to download genome of assembly level 'chromosome' for several bacterial taxon. But realized that there were less number of total genomes being downloaded. I cross-checked the number for E. coli genomes and it is 4807 as of now. here is the code used for downloading and tried 'dehydrated' option as well but the same results and then I tried 'complete' assembly level which resulted in 7,763 which is quite surprising as it should be lesser than chromosome level. And also the number is quite similar(763 and 7,763) which I have no clue why.

E:\R\blast_test>datasets download genome taxon 562 --assembly-level chromosome --dehydrated --filename Coli2_dataset.zip
Collecting 763 genome records [================================================] 100% 763/763
Downloading: Coli2_dataset.zip    331kB valid zip structure -- files not checked
Validating package [================================================] 100% 4/4

E:\R\blast_test>datasets download genome taxon 562 --assembly-level chromosome --dehydrated --include genome --filename Coli2_dataset.zip
Collecting 763 genome records [================================================] 100% 763/763
Downloading: Coli2_dataset.zip    331kB valid zip structure -- files not checked
Validating package [================================================] 100% 4/4

E:\R\blast_test>datasets download genome taxon 562 --assembly-level complete --dehydrated --include genome --filename Coli2_dataset.zip
Collecting 7,763 genome records [================================================] 100% 7763/7763
Downloading: Coli2_dataset.zip    3.18MB valid zip structure -- files not checked
Validating package [================================================] 100% 4/4

E:\R\blast_test>datasets --version
datasets version: 16.10.1

Now I have to question all the downloads as this has become unreliable. Please solve this issue so that it downloads the correct number of genomes at that time. Thanks

mkdevesh avatar Mar 28 '24 11:03 mkdevesh