parallel-fastq-dump
parallel-fastq-dump copied to clipboard
parallel-fastq-dump not working any more
Recently I found parallel-fastq-dump is not working. I install the recent version from the conda.
hello,
please give me more details, command line, error messages, SRA ids you tried, etc.
I install the tool using conda, it works before. Now it seems to have some issues. My command is: parallel-fastq-dump --sra-id SRR10024973 --threads 4 --outdir out/ --split-files --gzip The error is below: 2022-07-26 14:24:00,050 - SRR ids: ['SRR10024973'] 2022-07-26 14:24:00,051 - extra args: ['--split-files', '--gzip'] 2022-07-26 14:24:00,051 - tempdir: /tmp/pfd_g54g5deb 2022-07-26 14:24:00,051 - CMD: sra-stat --meta --quick SRR10024973 Traceback (most recent call last): File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 116, in get_spot_count total += int(l.split('|')[2].split(':')[0]) IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 181, in
--sra-stat STDERR-- 2022-07-26T21:24:00 sra-stat.2.8.0 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed ) 2022-07-26T21:24:00 sra-stat.2.8.0 sys: mbedtls_ssl_get_verify_result returned 0x8 ( !! The certificate is not correctly signed by the trusted CA ) 2022-07-26T21:24:00 sra-stat.2.8.0 err: no error - error with http open 'https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR10024973/SRR10024973' 2022-07-26T21:24:01 sra-stat.2.8.0 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed ) 2022-07-26T21:24:01 sra-stat.2.8.0 sys: mbedtls_ssl_get_verify_result returned 0x8 ( !! The certificate is not correctly signed by the trusted CA ) 2022-07-26T21:24:01 sra-stat.2.8.0 err: no error - error with http open 'https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR10024973/SRR10024973' 2022-07-26T21:24:01 sra-stat.2.8.0 int: connection failed while opening file within cryptographic module - 'SRR10024973'
looks like you are using sratools version 2.8.0, you need to update to a more recent version.
I tested with sratools 2.11.0 and it worked.
I install the software using conda, so how i can update this within conda env?
with the env activated, try:
conda install 'sra-tools>=2.11.0'
Finally, "conda install -c bioconda sra-tools=2.10" works.
hello @guandailu @rvalieris I installed sratools 2.10, but the errors still continue. Could you please help me with this issue? Thanks!
# install parallel-fastq-dump and sra-tools v2.10
conda config --add channels bioconda
conda install parallel-fastq-dump
conda install -c bioconda sra-tools=2.10
parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-07 12:21:28,266 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-07 12:21:28,266 - extra args: ['--split-files', '--gzip']
2022-09-07 12:21:28,270 - tempdir: /tmp/pfd_hjas65p0
2022-09-07 12:21:28,270 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
main()
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
pfd(args, si, extra_args)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
n_spots = get_spot_count(srr_id)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--
--sra-stat STDERR--
2022-09-07T17:21:29 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'
My installation steps are: conda install -c bioconda parallel-fastq-dump -n parallel-fastq-dump -m conda install -c bioconda sra-tools=2.10 -n parallel-fastq-dump
To use it: conda activate parallel-fastq-dump parallel-fastq-dump -h
this is a dbGap controlled file, you need permisson to download it .
if you already have the access setup, you need to go inside the directory configured in vdb-config and execute inside there like this, for example:
cd /mnt/d/HYJ/dbGap/sra/
parallel-fastq-dump --sra-id SRR15652839 --threads 16 --outdir out --split-files --gzip
hello @rvalieris Thanks for the response. Yes, this is dbGap-controlled data and we have access to download all. The weird thing is that, we downloaded 453 files of them (total 456) and succeeded in converting them to fastq by parallel-fastq-dump v0.6.7 with sratools v2.8.0 (internally installed by parallel-fastq-dump)
parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRRxxxx.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
However, these 3 SRA files (SRR15652839, SRR15653095, SRR15653115) cannot be downloaded until dbGap team reloaded them in last week. And then we used the same coding, but met the errors:
parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-07 12:21:28,266 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-07 12:21:28,266 - extra args: ['--split-files', '--gzip']
2022-09-07 12:21:28,270 - tempdir: /tmp/pfd_hjas65p0
2022-09-07 12:21:28,270 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
main()
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
pfd(args, si, extra_args)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
n_spots = get_spot_count(srr_id)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--
--sra-stat STDERR--
2022-09-07T17:21:29 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'
I followed your way, go inside the directory I configured, but still cannot convert it:
hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ parallel-fastq-dump --sra-id SRR15652839 --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-08 10:47:18,820 - SRR ids: ['SRR15652839']
2022-09-08 10:47:18,820 - extra args: ['--split-files', '--gzip']
2022-09-08 10:47:18,825 - tempdir: /tmp/pfd_uk3ma0fl
2022-09-08 10:47:18,825 - CMD: sra-stat --meta --quick SRR15652839
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
main()
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
pfd(args, si, extra_args)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
n_spots = get_spot_count(srr_id)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--
--sra-stat STDERR--
2022-09-08T15:47:20 sra-stat.2.10.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR15652839' - Access denied - please request permission to access phs002407 / GRU in dbGaP. ( 403 )
2022-09-08T15:47:20 sra-stat.2.10.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR15652839' - Access denied - please request permission to access phs002407 / GRU in dbGaP. ( 403 )
2022-09-08T15:47:20 sra-stat.2.10.0 int: directory not found while opening manager within virtual file system module - 'SRR15652839'
hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-08 10:48:03,947 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-08 10:48:03,947 - extra args: ['--split-files', '--gzip']
2022-09-08 10:48:03,952 - tempdir: /tmp/pfd_au0bpbmv
2022-09-08 10:48:03,952 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
main()
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
pfd(args, si, extra_args)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
n_spots = get_spot_count(srr_id)
File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--
--sra-stat STDERR--
2022-09-08T15:48:04 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'
I think that the SRA team might do something on the SRA files which makes parallel-fastq-dump
only work for the old ones instead of the new ones. Is is possible to solve this issue?
Thanks!
Best,
YJ
I see, try to run this command to see what happens:
sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
this should return a table with the number of reads/spot, parallel-fastq-dump uses this to know how many reads per thread to use, but this error: IndexError: list index out of range
indicates the output is not what was expected.
thanks for the quick response, @rvalieris Please see the results
hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
2022-09-08T16:13:11 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'
Here also attaches a positive control that I can convert it to fastq by parallel-fastq-dump
hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR11770344.sra
/mnt/d/HYJ/dbGap/sra/SRR11770344.sra||153886278:19081898472:19081898472|:|:|:
Thanks!
I think this is could be due to a change on sra-tools 2.10.0, maybe this will help: https://github.com/ncbi/sra-tools/wiki/First-help-on-decryption-dbGaP-data
hello @rvalieris , thanks for the information. I tried that way in cmd of Windows. It didn't work, either. I'm sending emails to SRA team for this issue.
C:\Users\Park_Lab\Downloads\sratoolkit.3.0.0-win64\bin>fasterq-dump --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc D:\HYJ\dbGap\sra\SRR15653115.sra
2022-09-08T17:23:42 fasterq-dump.3.0.0 err: libs/vfs/names4-response.c:2273:Response4StatusInit: error unexpected while resolving query within virtual file system module - No accession to process ( 500 )
Failed to call external services.
I think that the SRA team changes the encryption algorithm for SRA files which makes parallel-fastq-dump only work for the old encrypted SRA files. That's also why they introduce sra-tools v3.0. Is there any plan to upgrade the parallel-fastq-dump? Thanks! Best, YJ
error unexpected while resolving query within virtual file system module - No accession to process
try to use just the SRR id instead of the path.
you could try also the previous parallel-fastq-dump cmdline, but add the --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc
argument.
if none of this works then contacting sra team seems like the best idea, parallel-fastq-dump is using sra-tools internally as well, if you can get fastq-dump/ fasterq-dump to work parallel-fastq-dump should work too.
hello @rvalieris SRA team told me some current SRA files don't support sra-tool kit < 3.0 now. That's why parallel-fastq-dump doesn't work. I run below coding, and it works.
fasterq-dump --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc SRR15652839 SRR15653095 SRR15653115 --threads 16 --outdir D:\HYJ\dbGap\sra\ --split-files --include-technical
Thanks! Best, YJ