GetOrganelle icon indicating copy to clipboard operation
GetOrganelle copied to clipboard

mitochondria assembly with pacbio

Open pollend opened this issue 4 years ago • 1 comments

I we're trying to assemble the sequences using pacbio data and I've ran this twice; one with the short reads and another will the pacbio reads.

without short reads:

GetOrganelle v1.6.2b

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25)  [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
Python libs: numpy 1.14.3; sympy 1.3; scipy 1.2.0; psutil 5.4.8
Dependencies: Bowtie2 2.3.5.1; SPAdes 3.13.0; Blast 2.2.30
./GetOrganelle/get_organelle_from_reads.py -u ./pacbio/m54048_180110_204750.fq ./pacbio/m54048_180117_052827.fq ./pacbio/m54048_180117_153809.fq ./pacbio/m54048_180121_041959.fq ./pacbio/m54048_180121_142957.fq ./pacbio/m54048_180122_105730.fq ./pacbio/m54048_180122_210722.fq ./pacbio/m54048_180124_232643.fq ./pacbio/m54048_180125_093419.fq ./pacbio/m54048_180125_194409.fq -F embplant_mt -R 50 --memory-save -k 19,21,23,25,27,61,111 -s ./mitochondrion.fasta -o mitochondrion_pacbio_jul_16_2019

2019-07-17 23:21:10,261 - INFO: Pre-reading fastq ...
2019-07-17 23:21:10,262 - INFO: Estimating reads to use ...
2019-07-17 23:21:11,462 - INFO: Estimating reads to use finished.
2019-07-17 23:21:46,944 - INFO: Counting read qualities ...
2019-07-17 23:21:51,742 - INFO: Identified quality encoding format = Sanger
2019-07-17 23:21:51,823 - INFO: Resetting '--min-quality-score 0'
2019-07-17 23:21:53,259 - INFO: Mean error rate = 1.0
2019-07-17 23:21:53,279 - INFO: Counting read lengths ...
2019-07-17 23:21:55,899 - INFO: Mean = 11859.6 bp, maximum = 65068 bp.
2019-07-17 23:21:55,900 - INFO: Reads used = 139991
2019-07-17 23:21:55,900 - INFO: Pre-reading fastq finished.

2019-07-17 23:21:55,900 - INFO: Making seed reads ...
2019-07-17 23:21:56,074 - INFO: Making seed - bowtie2 index ...
2019-07-17 23:21:56,744 - INFO: Making seed - bowtie2 index finished.
2019-07-17 23:21:56,745 - INFO: Mapping reads to seed bowtie2 index ...
2019-07-18 04:48:18,842 - INFO: Mapping finished.
2019-07-18 04:48:18,843 - INFO: Seed reads made: mitochondrion_pinta_pacbio_jul_16_2019/seed/embplant_mt.initial.fq (9318040 bytes)
2019-07-18 04:48:18,844 - INFO: Making seed reads finished.

2019-07-18 04:48:18,844 - INFO: Checking seed reads and parameters ...
2019-07-18 04:48:18,844 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2019-07-18 04:48:18,844 - INFO: If the result graph is not a circular organelle genome, 
2019-07-18 04:48:18,845 - INFO:   you could adjust the value(s) of '-w'/'-R' for another new run.
2019-07-18 04:48:29,498 - INFO: Pre-assembling mapped reads ...
2019-07-18 06:48:39,180 - WARNING: Pre-assembling failed. The estimations for embplant_mt-hitting base-coverage and word size may be misleading.
2019-07-18 06:48:40,499 - INFO: Estimated embplant_mt-hitting base-coverage = 329.92
2019-07-18 06:48:48,591 - ERROR: 
Traceback (most recent call last):
  File "/home/[email protected]/Hagop/./GetOrganelle/get_organelle_from_reads.py", line 3636, in main
    resume=resume, verbose_log=verb_log, zip_files=options.zip_files)
  File "/home/[email protected]/Hagop/./GetOrganelle/get_organelle_from_reads.py", line 1459, in check_parameters
    wc_bc_ratio_constant=wc_bc_ratio_constant, organelle_type=organelle_types[go_type])
  File "/home/[email protected]/Hagop/./GetOrganelle/get_organelle_from_reads.py", line 1091, in estimate_word_size
    estimated_word_size = int(read_length * (1 - word_cov / base_cov)) + 1
ValueError: cannot convert float NaN to integer

Total cost 26859.11 s
Please email [email protected] or [email protected] if you find bugs!
Please provide me with the get_org.log.txt file!

with short reads:


GetOrganelle v1.6.2b

get_organelle_from_reads.py assembles organelle genomes from genome skimming data.
Find updates in https://github.com/Kinggerm/GetOrganelle and see README.md for more information.

Python 3.6.7 | packaged by conda-forge | (default, Nov 21 2018, 02:32:25)  [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
Python libs: numpy 1.14.3; sympy 1.3; scipy 1.2.0; psutil 5.4.8
Dependencies: Bowtie2 2.3.5.1; SPAdes 3.13.0; Blast 2.2.30
/home/[email protected]/Hagop/./GetOrganelle/get_organelle_from_reads.py -1 ./HA1_DSW64714-V_HT2GMCCXY_L4_1.fq.gz ./HA1_DSW64714-V_HT2GMCCXY_L5_1.fq.gz ./HA1_DSW64714-V_HT2GMCCXY_L6_1.fq.gz ./HA1_DSW64714-V_HT2GMCCXY_L7_1.fq.gz -2 ./HA1_DSW64714-V_HT2GMCCXY_L4_2.fq.gz ./HA1_DSW64714-V_HT2GMCCXY_L5_2.fq.gz ./HA1_DSW64714-V_HT2GMCCXY_L6_2.fq.gz ./HA1_DSW64714-V_HT2GMCCXY_L7_2.fq.gz -u ./pacbio/m54048_180110_204750.fq ./pacbio/m54048_180117_052827.fq ./pacbio/m54048_180117_153809.fq ./pacbio/m54048_180121_041959.fq ./pacbio/m54048_180121_142957.fq ./pacbio/m54048_180122_105730.fq ./pacbio/m54048_180122_210722.fq ./pacbio/m54048_180124_232643.fq ./pacbio/m54048_180125_093419.fq ./pacbio/m54048_180125_194409.fq -F embplant_mt -R 50 --memory-save -k 19,21,23,25,27,61,111 -s ./mitochondrion.fasta -o mitochondrion_pacbio_jul_16_2019_with_short_reads

2019-07-17 14:24:52,985 - INFO: Pre-reading fastq ...
2019-07-17 14:24:52,986 - INFO: Estimating reads to use ...
2019-07-17 14:24:54,407 - INFO: Tasting 100000+100000+100000 reads ...
2019-07-17 20:52:05,080 - INFO: Estimating reads to use finished.
2019-07-17 20:52:05,081 - INFO: Unzipping reads file: ./Pinta/HA1_DSW64714-V_HT2GMCCXY_L4_1.fq.gz (7417399283 bytes)
2019-07-17 20:53:23,926 - INFO: Unzipping reads file: ./Pinta/HA1_DSW64714-V_HT2GMCCXY_L4_2.fq.gz (8342958686 bytes)
2019-07-17 20:54:54,201 - INFO: Counting read qualities ...
2019-07-17 20:55:00,871 - INFO: Identified quality encoding format = Illumina 1.8+
2019-07-17 20:55:00,954 - INFO: Resetting '--min-quality-score 0'
2019-07-17 20:55:03,094 - INFO: Mean error rate = 0.9761
2019-07-17 20:55:03,117 - INFO: Counting read lengths ...
2019-07-17 20:56:43,024 - INFO: Mean = 160.7 bp, maximum = 61722 bp.
2019-07-17 20:56:43,024 - INFO: Reads used = 15726496+15726496+27774
2019-07-17 20:56:43,024 - INFO: Pre-reading fastq finished.

2019-07-17 20:56:43,025 - INFO: Making seed reads ...
2019-07-17 20:56:43,059 - INFO: Making seed - bowtie2 index ...
2019-07-17 20:56:43,687 - INFO: Making seed - bowtie2 index finished.
2019-07-17 20:56:43,688 - INFO: Mapping reads to seed bowtie2 index ...
2019-07-17 22:15:19,063 - INFO: Mapping finished.
2019-07-17 22:15:19,065 - INFO: Seed reads made: mitochondrion_pinta_pacbio_jul_16_2019_with_short_reads/seed/embplant_mt.initial.fq (754284740 bytes)
2019-07-17 22:15:19,065 - INFO: Making seed reads finished.

2019-07-17 22:15:19,065 - INFO: Checking seed reads and parameters ...
2019-07-17 22:15:19,066 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2019-07-17 22:15:19,066 - INFO: If the result graph is not a circular organelle genome, 
2019-07-17 22:15:19,066 - INFO:   you could adjust the value(s) of '-w'/'-R' for another new run.
2019-07-17 22:17:02,294 - INFO: Pre-assembling mapped reads ...
2019-07-17 22:22:28,112 - INFO: Pre-assembling mapped reads finished.
2019-07-17 22:22:28,112 - INFO: Estimated embplant_mt-hitting base-coverage = 1084.55
2019-07-17 22:22:28,112 - INFO: Reads reduced to = 7250236+7250236+12804
2019-07-17 22:22:28,112 - INFO: Adjusting expected embplant_mt base coverage to 500.00
2019-07-17 22:22:28,114 - INFO: Estimated word size(s): 120
2019-07-17 22:22:28,114 - INFO: Setting '-w 120'
2019-07-17 22:22:28,114 - INFO: Setting '--max-extending-len inf'
2019-07-17 22:22:32,109 - INFO: Checking seed reads and parameters finished.

2019-07-17 22:22:32,110 - INFO: Making read index ...
2019-07-17 22:23:10,076 - INFO: For mitochondrion_pinta_pacbio_jul_16_2019_with_short_reads/1-HA1_DSW64714-V_HT2GMCCXY_L4_1.fq.gz.fastq, only top 7250236 reads are used in downstream analysis.
2019-07-17 22:23:39,306 - INFO: For mitochondrion_pinta_pacbio_jul_16_2019_with_short_reads/2-HA1_DSW64714-V_HT2GMCCXY_L4_2.fq.gz.fastq, only top 7250236 reads are used in downstream analysis.
2019-07-17 22:23:40,034 - INFO: For mitochondrion_pinta_pacbio_jul_16_2019_with_short_reads/3-m54048_180110_204750.fq, only top 12804 reads are used in downstream analysis.
2019-07-17 22:24:10,217 - INFO: Mem 1.836 G, 14513259 reads
2019-07-17 22:24:11,278 - INFO: Making read index finished.

2019-07-17 22:24:11,278 - INFO: Extending ...
2019-07-17 22:24:11,278 - INFO: Adding initial words ...
2019-07-17 22:25:10,729 - INFO: AW 14208410
2019-07-17 22:27:11,914 - INFO: Round 1: 14513259/14513259 AI 1854057 AW 15495760 Mem 3.443
2019-07-17 22:28:40,790 - INFO: Round 2: 14513259/14513259 AI 2378131 AW 16148864 Mem 3.568
2019-07-17 22:30:07,553 - INFO: Round 3: 14513259/14513259 AI 2505435 AW 16338056 Mem 3.603
2019-07-17 22:31:32,237 - INFO: Round 4: 14513259/14513259 AI 2518460 AW 16387532 Mem 3.611
2019-07-17 22:32:56,658 - INFO: Round 5: 14513259/14513259 AI 2525373 AW 16415728 Mem 3.616
2019-07-17 22:34:20,164 - INFO: Round 6: 14513259/14513259 AI 2530083 AW 16436008 Mem 3.62
2019-07-17 22:35:45,335 - INFO: Round 7: 14513259/14513259 AI 2533065 AW 16449790 Mem 3.622
2019-07-17 22:37:11,266 - INFO: Round 8: 14513259/14513259 AI 2535577 AW 16460300 Mem 3.624
2019-07-17 22:38:33,329 - INFO: Round 9: 14513259/14513259 AI 2537491 AW 16469288 Mem 3.626
2019-07-17 22:39:56,800 - INFO: Round 10: 14513259/14513259 AI 2539066 AW 16475908 Mem 3.627
2019-07-17 22:41:20,945 - INFO: Round 11: 14513259/14513259 AI 2540401 AW 16481888 Mem 3.628
2019-07-17 22:42:44,312 - INFO: Round 12: 14513259/14513259 AI 2541787 AW 16487286 Mem 3.629
2019-07-17 22:44:07,382 - INFO: Round 13: 14513259/14513259 AI 2543128 AW 16492716 Mem 3.63
2019-07-17 22:45:31,855 - INFO: Round 14: 14513259/14513259 AI 2544382 AW 16497580 Mem 3.631
2019-07-17 22:46:55,018 - INFO: Round 15: 14513259/14513259 AI 2545237 AW 16501210 Mem 3.631
2019-07-17 22:48:15,244 - INFO: Round 16: 14513259/14513259 AI 2545754 AW 16503414 Mem 3.632
2019-07-17 22:49:41,764 - INFO: Round 17: 14513259/14513259 AI 2545877 AW 16504016 Mem 3.632
2019-07-17 22:51:02,758 - INFO: Round 18: 14513259/14513259 AI 2545925 AW 16504570 Mem 3.632
2019-07-17 22:52:26,045 - INFO: Round 19: 14513259/14513259 AI 2545956 AW 16504756 Mem 3.632
2019-07-17 22:53:45,538 - INFO: Round 20: 14513259/14513259 AI 2545989 AW 16505104 Mem 3.632
2019-07-17 22:55:04,942 - INFO: Round 21: 14513259/14513259 AI 2546001 AW 16505246 Mem 3.632
2019-07-17 22:56:23,664 - INFO: Round 22: 14513259/14513259 AI 2546005 AW 16505294 Mem 3.632
2019-07-17 22:57:45,126 - INFO: Round 23: 14513259/14513259 AI 2546005 AW 16505294 Mem 3.632
2019-07-17 22:57:45,127 - INFO: No more reads found and terminated ...
2019-07-17 22:58:31,637 - INFO: Extending finished.

2019-07-17 22:58:32,672 - INFO: Separating filtered fastq file ... 
2019-07-17 22:58:53,638 - INFO: Setting '-k 21,23,25,27,61,111'
2019-07-17 22:58:53,639 - INFO: Assembling using SPAdes ...
2019-07-17 23:21:01,904 - ERROR: Error in SPAdes: 
== Error ==  system call for: "['/home/[email protected]/Hagop/GetOrganelle/GetOrganelleDep/linux/SPAdes/bin/spades-core', '/home/[email protected]/Hagop/mitochondrion_pinta_pacbio_jul_16_2019_with_short_reads/filtered_spades/K21/configs/config.info']" finished abnormally, err code: 255


2019-07-17 23:21:01,904 - ERROR: Assembling failed.

Total cost 32169.49 s
Thank you!

pollend avatar Jul 19 '19 15:07 pollend

Hi, currently, GetOrganelle does not support 3rd generation sequencing (-s is for single end illumina). I would let you know once we make it. BTW, data should be connected with comma like -1 ./HA1_DSW64714-V_HT2GMCCXY_L4_1.fq.gz,./HA1_DSW64714-V_HT2GMCCXY_L5_1.fq.gz,./HA1_DSW64714-V_HT2GMCCXY_L6_1.fq.gz,./HA1_DSW64714-V_HT2GMCCXY_L7_1.fq.gz, otherwise, the latter 3 files would not be used.

Kinggerm avatar Jul 19 '19 15:07 Kinggerm