parallel-fastq-dump icon indicating copy to clipboard operation
parallel-fastq-dump copied to clipboard

parallel-fastq-dump not working any more

Open FarmOmics opened this issue 1 year ago • 6 comments

Recently I found parallel-fastq-dump is not working. I install the recent version from the conda.

FarmOmics avatar Jul 23 '22 17:07 FarmOmics

hello,

please give me more details, command line, error messages, SRA ids you tried, etc.

rvalieris avatar Jul 25 '22 12:07 rvalieris

I install the tool using conda, it works before. Now it seems to have some issues. My command is: parallel-fastq-dump --sra-id SRR10024973 --threads 4 --outdir out/ --split-files --gzip The error is below: 2022-07-26 14:24:00,050 - SRR ids: ['SRR10024973'] 2022-07-26 14:24:00,051 - extra args: ['--split-files', '--gzip'] 2022-07-26 14:24:00,051 - tempdir: /tmp/pfd_g54g5deb 2022-07-26 14:24:00,051 - CMD: sra-stat --meta --quick SRR10024973 Traceback (most recent call last): File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 116, in get_spot_count total += int(l.split('|')[2].split(':')[0]) IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 181, in main() File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 175, in main pfd(args, si, extra_args) File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 49, in pfd n_spots = get_spot_count(srr_id) File "/home/dguan/anaconda3/envs/parallel-fastq-dump/bin/parallel-fastq-dump", line 122, in get_spot_count raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt))) IndexError: sra-stat output parsing error! --sra-stat STDOUT--

--sra-stat STDERR-- 2022-07-26T21:24:00 sra-stat.2.8.0 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed ) 2022-07-26T21:24:00 sra-stat.2.8.0 sys: mbedtls_ssl_get_verify_result returned 0x8 ( !! The certificate is not correctly signed by the trusted CA ) 2022-07-26T21:24:00 sra-stat.2.8.0 err: no error - error with http open 'https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR10024973/SRR10024973' 2022-07-26T21:24:01 sra-stat.2.8.0 sys: connection failed while opening file within cryptographic module - mbedtls_ssl_handshake returned -9984 ( X509 - Certificate verification failed, e.g. CRL, CA or signature check failed ) 2022-07-26T21:24:01 sra-stat.2.8.0 sys: mbedtls_ssl_get_verify_result returned 0x8 ( !! The certificate is not correctly signed by the trusted CA ) 2022-07-26T21:24:01 sra-stat.2.8.0 err: no error - error with http open 'https://sra-pub-sars-cov2.s3.amazonaws.com/run/SRR10024973/SRR10024973' 2022-07-26T21:24:01 sra-stat.2.8.0 int: connection failed while opening file within cryptographic module - 'SRR10024973'

FarmOmics avatar Jul 26 '22 21:07 FarmOmics

looks like you are using sratools version 2.8.0, you need to update to a more recent version.

I tested with sratools 2.11.0 and it worked.

rvalieris avatar Jul 27 '22 16:07 rvalieris

I install the software using conda, so how i can update this within conda env?

FarmOmics avatar Jul 31 '22 07:07 FarmOmics

with the env activated, try: conda install 'sra-tools>=2.11.0'

rvalieris avatar Aug 01 '22 12:08 rvalieris

Finally, "conda install -c bioconda sra-tools=2.10" works.

FarmOmics avatar Aug 17 '22 14:08 FarmOmics

hello @guandailu @rvalieris I installed sratools 2.10, but the errors still continue. Could you please help me with this issue? Thanks!

# install parallel-fastq-dump and sra-tools v2.10
conda config --add channels bioconda
conda install parallel-fastq-dump
conda install -c bioconda sra-tools=2.10

parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-07 12:21:28,266 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-07 12:21:28,266 - extra args: ['--split-files', '--gzip']
2022-09-07 12:21:28,270 - tempdir: /tmp/pfd_hjas65p0
2022-09-07 12:21:28,270 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-07T17:21:29 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'

hyjforesight avatar Sep 07 '22 17:09 hyjforesight

My installation steps are: conda install -c bioconda parallel-fastq-dump -n parallel-fastq-dump -m conda install -c bioconda sra-tools=2.10 -n parallel-fastq-dump

To use it: conda activate parallel-fastq-dump parallel-fastq-dump -h

FarmOmics avatar Sep 07 '22 19:09 FarmOmics

this is a dbGap controlled file, you need permisson to download it .

if you already have the access setup, you need to go inside the directory configured in vdb-config and execute inside there like this, for example:

cd /mnt/d/HYJ/dbGap/sra/
parallel-fastq-dump --sra-id SRR15652839  --threads 16 --outdir out --split-files --gzip

rvalieris avatar Sep 08 '22 13:09 rvalieris

hello @rvalieris Thanks for the response. Yes, this is dbGap-controlled data and we have access to download all. The weird thing is that, we downloaded 453 files of them (total 456) and succeeded in converting them to fastq by parallel-fastq-dump v0.6.7 with sratools v2.8.0 (internally installed by parallel-fastq-dump)

parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRRxxxx.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip

However, these 3 SRA files (SRR15652839, SRR15653095, SRR15653115) cannot be downloaded until dbGap team reloaded them in last week. And then we used the same coding, but met the errors:

parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-07 12:21:28,266 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-07 12:21:28,266 - extra args: ['--split-files', '--gzip']
2022-09-07 12:21:28,270 - tempdir: /tmp/pfd_hjas65p0
2022-09-07 12:21:28,270 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-07T17:21:29 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'

I followed your way, go inside the directory I configured, but still cannot convert it:

hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ parallel-fastq-dump --sra-id SRR15652839 --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-08 10:47:18,820 - SRR ids: ['SRR15652839']
2022-09-08 10:47:18,820 - extra args: ['--split-files', '--gzip']
2022-09-08 10:47:18,825 - tempdir: /tmp/pfd_uk3ma0fl
2022-09-08 10:47:18,825 - CMD: sra-stat --meta --quick SRR15652839
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-08T15:47:20 sra-stat.2.10.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR15652839' - Access denied - please request permission to access phs002407 / GRU in dbGaP. ( 403 )
2022-09-08T15:47:20 sra-stat.2.10.0 err: query unauthorized while resolving query within virtual file system module - failed to resolve accession 'SRR15652839' - Access denied - please request permission to access phs002407 / GRU in dbGaP. ( 403 )
2022-09-08T15:47:20 sra-stat.2.10.0 int: directory not found while opening manager within virtual file system module - 'SRR15652839'
hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ parallel-fastq-dump --sra-id /mnt/d/HYJ/dbGap/sra/SRR15652839.sra /mnt/d/HYJ/dbGap/sra/SRR15653095.sra /mnt/d/HYJ/dbGap/sra/SRR15653115.sra --threads 16 --outdir /mnt/d/HYJ/dbGap/sra/ --split-files --gzip
2022-09-08 10:48:03,947 - SRR ids: ['/mnt/d/HYJ/dbGap/sra/SRR15652839.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653095.sra', '/mnt/d/HYJ/dbGap/sra/SRR15653115.sra']
2022-09-08 10:48:03,947 - extra args: ['--split-files', '--gzip']
2022-09-08 10:48:03,952 - tempdir: /tmp/pfd_au0bpbmv
2022-09-08 10:48:03,952 - CMD: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 116, in get_spot_count
    total += int(l.split('|')[2].split(':')[0])
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 181, in <module>
    main()
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 175, in main
    pfd(args, si, extra_args)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 49, in pfd
    n_spots = get_spot_count(srr_id)
  File "/home/hyjforesight/anaconda3/envs/test/bin/parallel-fastq-dump", line 122, in get_spot_count
    raise IndexError(msg.format('\n'.join(txt), '\n'.join(etxt)))
IndexError: sra-stat output parsing error!
--sra-stat STDOUT--

--sra-stat STDERR--
2022-09-08T15:48:04 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'

I think that the SRA team might do something on the SRA files which makes parallel-fastq-dump only work for the old ones instead of the new ones. Is is possible to solve this issue? Thanks! Best, YJ

hyjforesight avatar Sep 08 '22 15:09 hyjforesight

I see, try to run this command to see what happens: sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra

this should return a table with the number of reads/spot, parallel-fastq-dump uses this to know how many reads per thread to use, but this error: IndexError: list index out of range indicates the output is not what was expected.

rvalieris avatar Sep 08 '22 16:09 rvalieris

thanks for the quick response, @rvalieris Please see the results

hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR15652839.sra
2022-09-08T16:13:11 sra-stat.2.10.0 int: item not found while retrieving encryption key within configuration module - '/mnt/d/HYJ/dbGap/sra/SRR15652839.sra'

Here also attaches a positive control that I can convert it to fastq by parallel-fastq-dump

hyjforesight@W10D-GW97ZC3:/mnt/d/HYJ/dbGap/sra$ sra-stat --meta --quick /mnt/d/HYJ/dbGap/sra/SRR11770344.sra
/mnt/d/HYJ/dbGap/sra/SRR11770344.sra||153886278:19081898472:19081898472|:|:|:

Thanks!

hyjforesight avatar Sep 08 '22 16:09 hyjforesight

I think this is could be due to a change on sra-tools 2.10.0, maybe this will help: https://github.com/ncbi/sra-tools/wiki/First-help-on-decryption-dbGaP-data

rvalieris avatar Sep 08 '22 16:09 rvalieris

hello @rvalieris , thanks for the information. I tried that way in cmd of Windows. It didn't work, either. I'm sending emails to SRA team for this issue.

C:\Users\Park_Lab\Downloads\sratoolkit.3.0.0-win64\bin>fasterq-dump --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc D:\HYJ\dbGap\sra\SRR15653115.sra
2022-09-08T17:23:42 fasterq-dump.3.0.0 err: libs/vfs/names4-response.c:2273:Response4StatusInit: error unexpected while resolving query within virtual file system module - No accession to process ( 500 )
Failed to call external services.

I think that the SRA team changes the encryption algorithm for SRA files which makes parallel-fastq-dump only work for the old encrypted SRA files. That's also why they introduce sra-tools v3.0. Is there any plan to upgrade the parallel-fastq-dump? Thanks! Best, YJ

hyjforesight avatar Sep 08 '22 17:09 hyjforesight

error unexpected while resolving query within virtual file system module - No accession to process try to use just the SRR id instead of the path.

you could try also the previous parallel-fastq-dump cmdline, but add the --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc argument.

if none of this works then contacting sra team seems like the best idea, parallel-fastq-dump is using sra-tools internally as well, if you can get fastq-dump/ fasterq-dump to work parallel-fastq-dump should work too.

rvalieris avatar Sep 08 '22 17:09 rvalieris

hello @rvalieris SRA team told me some current SRA files don't support sra-tool kit < 3.0 now. That's why parallel-fastq-dump doesn't work. I run below coding, and it works.

fasterq-dump --ngc C:\Users\Park_Lab\Downloads\prj_32846.ngc SRR15652839 SRR15653095 SRR15653115 --threads 16 --outdir D:\HYJ\dbGap\sra\ --split-files --include-technical

Thanks! Best, YJ

hyjforesight avatar Sep 08 '22 20:09 hyjforesight