GetOrganelle
GetOrganelle copied to clipboard
Speedup suggestion during initial FASTQ decompression
Thanks for developing GetOrganelle, it seems very complete and thorough. I am trying it for species of Ericaceae, hopefully it will handle the small repeats better than other software I tried in the past (any tips to improve these assemblies are welcome).
However, during my initial tests in a Mac I noticed it takes an excessive amount of time just to decompress the FASTQ files at the begginning (a file of ~5GB is taking more that 1.5 hours), my guess is that the combination of Mac's head
+ gunzip
is the reason, I found that many of Mac's own standard programs are really slow compared to Linux's versions. My suggestion would be to use Python's own gzip
library to decompress and compress reads more quickly, if not, the BBTools
suite (https://jgi.doe.gov/data-and-tools/bbtools/) handles FASTQ files very fast as well, and a random subsampling could be performed with its program reformat.sh
Edgardo
Hi Edgardo,
Thanks for using GetOrganelle and for the kind suggestion. I will carefully consider and test it.
As for Ericaceae, it will still be difficult with only illumina data. I am developing another tool/function for utilizing long read sequencing reads for this. Hopefully it will be helpful if you have these kind of data.
Best, Jianjun
Hi,I conducted GetOrganelle and found these Errors like this:
GetOrganelle v1.7.1
get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.
Python 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] PYTHON LIBS: GetOrganelleLib 1.7.1; numpy 1.19.1; sympy 1.6.2; scipy 1.3.0; psutil 5.4.7 DEPENDENCIES: Bowtie2 /public/home/aaa/anaconda3/bin/bowtie2-align-s); SPAdes 3.13.0; Blast 2.9.0 LABEL DB: embplant_mt customized; embplant_pt customized WORKING DIR: /public/home/aaa/project/01_tea/DASZ_mt/assemble /public/home/aaa/anaconda3/bin/get_organelle_from_reads.py -s tea.mt.fasta -1 DASZ.R1.fastq.gz -2 DASZ.R2.fastq.gz -o DASZ_mt -R 50 -k 55,85,115,125,135 -F embplant_mt -t 6
2020-09-29 12:59:33,138 - INFO: Pre-reading fastq ... 2020-09-29 12:59:33,139 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf') 2020-09-29 12:59:33,365 - INFO: Tasting 100000+100000 reads ... 2020-09-29 12:59:34,205 - ERROR: Traceback (most recent call last): File "/public/home/fafu_chenshuai/anaconda3/bin/get_organelle_from_reads.py", line 3750, in main random_seed=options.random_seed, verbose_log=options.verbose_log, log_handler=log_handler) File "/public/home/fafu_chenshuai/anaconda3/bin/get_organelle_from_reads.py", line 1014, in estimate_maximum_n_reads_using_mapping which_bowtie2=which_bowtie2) File "/public/home/fafu_chenshuai/anaconda3/lib/python3.7/site-packages/GetOrganelleLib/pipe_control_func.py", line 373, in map_with_bowtie2 raise Exception("") Exception
Total cost 26.55 s Please email [email protected] or [email protected] if you find bugs!
Hi,I conducted GetOrganelle and found these Errors like this:
GetOrganelle v1.7.1
get_organelle_from_reads.py assembles organelle genomes from genome skimming data. Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.
Python 3.7.6 | packaged by conda-forge | (default, Jun 1 2020, 18:57:50) [GCC 7.5.0] PYTHON LIBS: GetOrganelleLib 1.7.1; numpy 1.19.1; sympy 1.6.2; scipy 1.3.0; psutil 5.4.7 DEPENDENCIES: Bowtie2 /public/home/aaa/anaconda3/bin/bowtie2-align-s); SPAdes 3.13.0; Blast 2.9.0 LABEL DB: embplant_mt customized; embplant_pt customized WORKING DIR: /public/home/aaa/project/01_tea/DASZ_mt/assemble /public/home/aaa/anaconda3/bin/get_organelle_from_reads.py -s tea.mt.fasta -1 DASZ.R1.fastq.gz -2 DASZ.R2.fastq.gz -o DASZ_mt -R 50 -k 55,85,115,125,135 -F embplant_mt -t 6
2020-09-29 12:59:33,138 - INFO: Pre-reading fastq ... 2020-09-29 12:59:33,139 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf') 2020-09-29 12:59:33,365 - INFO: Tasting 100000+100000 reads ... 2020-09-29 12:59:34,205 - ERROR: Traceback (most recent call last): File "/public/home/fafu_chenshuai/anaconda3/bin/get_organelle_from_reads.py", line 3750, in main random_seed=options.random_seed, verbose_log=options.verbose_log, log_handler=log_handler) File "/public/home/fafu_chenshuai/anaconda3/bin/get_organelle_from_reads.py", line 1014, in estimate_maximum_n_reads_using_mapping which_bowtie2=which_bowtie2) File "/public/home/fafu_chenshuai/anaconda3/lib/python3.7/site-packages/GetOrganelleLib/pipe_control_func.py", line 373, in map_with_bowtie2 raise Exception("") Exception
Total cost 26.55 s Please email [email protected] or [email protected] if you find bugs!
I'm sorry that your question is irrelevant to this issue. Please open another issue. I have to delete your question here soon.
@Kinggerm Does get organelle pull in Pigz as well when installing via conda? If so, that would be a lot better as pigz is foolishly fast!
@harish0201 That's true. But currently pigz is not required for non-conda installation. Further incorporating needs more testing in different environment, it's on my plan though. Thanks for the suggestions.