datasets icon indicating copy to clipboard operation
datasets copied to clipboard

datasets rehydrate error during download

Open VPEMERIDIAN opened this issue 2 years ago • 1 comments

Hello,

I've succesfully downloaded a dehydrated dataset of Campylobacter coli, althought, I'm getting this error during the rehydration process :

~/D$ ./datasets rehydrate --directory Campylo_coli/ Found 38905 files for rehydration Completed 467 of 38905 [------------------------------------------------] 1% Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490195.1/GCF_001490195.1_EC3511_genomic.fna 1.59MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490415.1/GCF_001490415.1_H042120298_genomic.fna 1.9MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490335.1/GCF_001490335.1_EC3952_genomic.fna 1.66MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490395.1/GCF_001490395.1_SS_2234_genomic.fna 1.68MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490255.1/GCF_001490255.1_EC3525_genomic.fna 1.8MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490315.1/GCF_001490315.1_CCN257_genomic.fna 1.82MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490295.1/GCF_001490295.1_EC4297_genomic.fna 1.68MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490375.1/GCF_001490375.1_SS_2356_genomic.fna 1.68MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490215.1/GCF_001490215.1_EC3575_genomic.fna 1.59MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490235.1/GCF_001490235.1_H072820535_genomic.fna 1.71MB done panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x87db75]

**goroutine 13 [running]: main/datasets/datasets.downloadFileWorker.func2(0xc0009b5f58, 0xc000994000, 0xc0009b5f40, 0xc000200080, 0xc000032480) /export/home/tomcat/TeamCity/Agent4/work/c6e6852d9a243866/dataloader/apps/public/Datasets/datasets/datasets/Rehydrate.go:191 +0x375 main/datasets/datasets.downloadFileWorker(0xc000200080, 0xc000032300, 0xc000032480) /export/home/tomcat/TeamCity/Agent4/work/c6e6852d9a243866/dataloader/apps/public/Datasets/datasets/datasets/Rehydrate.go:216 +0x107 created by main/datasets/datasets.downloadMultipleFiles /export/home/tomcat/TeamCity/Agent4/work/c6e6852d9a243866/dataloader/apps/public/Datasets/datasets/datasets/Rehydrate.go:241 +0x165

I'm using datasets 12.6.0 and my command line was :

./datasets download genome taxon "Campylobacter coli" --exclude-gff3 --exclude-rna --exclude-protein --dehydrated

Is there a fix for this problem? I'm trying to download the whole genome dataset for this bacterial species to do comparative genomic studies on so downloading from the website would be complicated.

Thank you!

VPEMERIDIAN avatar Jul 30 '21 14:07 VPEMERIDIAN

Hi VPEMERIDIAN,

Thanks for your feedback. I was unable to reproduce this exact error on my home computer. There is a known bug where the command-line tool reports gateway errors while trying to find sequence_report files that do not exist, however, you should still be able to download all available genomic sequence and annotation files.

We will continue trying to reproduce the error that you encountered and we plan to make time to improve the overall reliability of the tool soon.

Thanks again for your feedback.

-Eric

Eric Cox, PhD [Contractor] (he/him/his) NCBI Datasets Sequence Enhancements, Tools and Delivery (SeqPlus) NIH/NLM/NCBI

ericcox1 avatar Aug 04 '21 18:08 ericcox1

Hello,

I've succesfully downloaded a dehydrated dataset of Campylobacter coli, althought, I'm getting this error during the rehydration process :

~/D$ ./datasets rehydrate --directory Campylo_coli/ Found 38905 files for rehydration Completed 467 of 38905 [------------------------------------------------] 1% Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490195.1/GCF_001490195.1_EC3511_genomic.fna 1.59MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490415.1/GCF_001490415.1_H042120298_genomic.fna 1.9MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490335.1/GCF_001490335.1_EC3952_genomic.fna 1.66MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490395.1/GCF_001490395.1_SS_2234_genomic.fna 1.68MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490255.1/GCF_001490255.1_EC3525_genomic.fna 1.8MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490315.1/GCF_001490315.1_CCN257_genomic.fna 1.82MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490295.1/GCF_001490295.1_EC4297_genomic.fna 1.68MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490375.1/GCF_001490375.1_SS_2356_genomic.fna 1.68MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490215.1/GCF_001490215.1_EC3575_genomic.fna 1.59MB done Downloading: Campylo_coli/ncbi_dataset/data/GCF_001490235.1/GCF_001490235.1_H072820535_genomic.fna 1.71MB done panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x87db75]

**goroutine 13 [running]: main/datasets/datasets.downloadFileWorker.func2(0xc0009b5f58, 0xc000994000, 0xc0009b5f40, 0xc000200080, 0xc000032480) /export/home/tomcat/TeamCity/Agent4/work/c6e6852d9a243866/dataloader/apps/public/Datasets/datasets/datasets/Rehydrate.go:191 +0x375 main/datasets/datasets.downloadFileWorker(0xc000200080, 0xc000032300, 0xc000032480) /export/home/tomcat/TeamCity/Agent4/work/c6e6852d9a243866/dataloader/apps/public/Datasets/datasets/datasets/Rehydrate.go:216 +0x107 created by main/datasets/datasets.downloadMultipleFiles /export/home/tomcat/TeamCity/Agent4/work/c6e6852d9a243866/dataloader/apps/public/Datasets/datasets/datasets/Rehydrate.go:241 +0x165

I'm using datasets 12.6.0 and my command line was :

./datasets download genome taxon "Campylobacter coli" --exclude-gff3 --exclude-rna --exclude-protein --dehydrated

Is there a fix for this problem? I'm trying to download the whole genome dataset for this bacterial species to do comparative genomic studies on so downloading from the website would be complicated.

Thank you!

I have the same issue. I have to redownload everything because there is no --resume option.

bounlu avatar Oct 25 '22 09:10 bounlu