GAPPadder icon indicating copy to clipboard operation
GAPPadder copied to clipboard

After Collect stage, some output files not found

Open davidecarlson opened this issue 4 years ago • 8 comments

I ran the Preprocess and Collect steps according the ReadMe with no apparent errors. However, it seems like some expected output was not produced because when I run the Assembly step, I get the following error:

First round assembly and merger...
Start merging...
Traceback (most recent call last):
  File "./main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "./main.py", line 274, in main_func
    gap_assembler.assemble_pipeline()
  File "/home/progs/GAPPadder/assemble_gaps.py", line 339, in assemble_pipeline
    id_remain=self.pick_already_constructed(contigs_select, fa_list, sf_picked)
  File "/home/progs/GAPPadder/assemble_gaps.py", line 321, in pick_already_constructed
    m_picked=contigs_select.get_already_picked(sf_picked)
  File "/home/progs/GAPPadder/pick_contigs.py", line 576, in get_already_picked
    with open(sf_picked) as fin_picked:
IOError: [Errno 2] No such file or directory: u'/<path to results dir>/gappadder/results/merged/../picked_seqs.fa'

Here are the commands that I ran:

python ./main.py -c Preprocess -g /<my path>/gappadder/gappadder_config.json
python ./main.py -c Collect -g /<my path>/gappadder/gappadder_config.json
python ./main.py -c Assembly -g /<my path>/gappadder/gappadder_config.json

Any ideas what could be going wrong? Thanks! Dave

davidecarlson avatar Dec 15 '20 18:12 davidecarlson

It looks like an earlier error before the assembly step or the assembly failed. Could you post your config.json file?

simoncchu avatar Dec 16 '20 02:12 simoncchu

Thanks for the response. Here is my config.json file:

{
    "draft_genome": {
        "fa": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/canu_bionano_scaffolds_and_contigs.fasta"
    },
    "raw_reads": [
            {
                "left": "/datahome/oenothera/genomic/Illumina_PE/elata/HI.0553.002.Index_7.johst_DNA_R1.fastq",
                "right": "/datahome/oenothera/genomic/Illumina_PE/elata/HI.0553.002.Index_7.johst_DNA_R2.fastq"
            },
            {
            "left": "/datahome/oenothera/genomic/Illumina_MP-NEW/elata_MP_nxtrim_R1.mp.fastq",
                "right": "/datahome/oenothera/genomic/Illumina_MP-NEW/elata_MP_nxtrim_R2.mp.fastq"
        }
      ],
    "alignments": [
            {
                "bam": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/processed/elataMP.sorted.markdup.bam",
                "is": "8178",
                "std": "853"
            },
            {
            "bam": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/processed/elataPE.sorted.markdup.bam",
                "is": "282",
                "std": "19"
        }
      ],
    "software_path": {
        "bwa": "bwa",
        "samtools": "samtools",
        "velvet": "/home/progs/velvet",
        "kmc": "kmc",
        "TERefiner": "/home/progs/GAPPadder/TERefiner_1",
        "ContigsMerger": "/home/progs/GAPPadder/ContigsMerger"
    },
    "parameters": {
            "working_folder": "/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/results",
        "min_gap_size": "50",
        "flank_length": "300",
        "nthreads": "40",
        "verbose": "1"
        },
        "kmer_length": [{
                        "k": 30,
                        "k_velvet": [{
                                "k": 29
                        },
                        {
                                "k": 27
                        }]
                },
                {
                        "k": 40,
                        "k_velvet": [{
                                "k": 39
                        },
                        {
                                "k": 37
                        }]
                },
                {
                        "k": 50,
                        "k_velvet": [{
                                "k": 49
                        },
                        {
                                "k": 47
                        }]
                }]
}

Let me know if you need any additional info. Thanks! Dave

davidecarlson avatar Dec 16 '20 02:12 davidecarlson

The config looks good for me. Would you please try to change velvet and kmc to the path of absolute folder? Like

"velvet": "/gpfs/scratchfs1/chc12015/tools/velvet-master/",
"kmc": "/gpfs/scratchfs1/chc12015/tools/kmc2.3/",

simoncchu avatar Dec 16 '20 04:12 simoncchu

Thanks, Simon. I changed the kmc path in the config file to the absolute path of the folder that contains the kmc binary (the velvet path in the config was was already the absolute path to the directory containing the velvet binaries). I then reran the Preprocess and Collect steps, which once again finished without producing any error messages.

However, when I start the Assembly step it once again fails with the same error:

First round assembly and merger...
Start merging...
Traceback (most recent call last):
  File "./main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "./main.py", line 274, in main_func
    gap_assembler.assemble_pipeline()
  File "/home/progs/GAPPadder/assemble_gaps.py", line 339, in assemble_pipeline
    id_remain=self.pick_already_constructed(contigs_select, fa_list, sf_picked)
  File "/home/progs/GAPPadder/assemble_gaps.py", line 321, in pick_already_constructed
    m_picked=contigs_select.get_already_picked(sf_picked)
  File "/home/progs/GAPPadder/pick_contigs.py", line 576, in get_already_picked
    with open(sf_picked) as fin_picked:
IOError: [Errno 2] No such file or directory: u'/datahome/oenothera/assembly/bionano_results_axel/cur_results_1297259/gappadder/results/merged/../picked_seqs.fa'

I should note that the "merged" directory in my results contains nothing but empty subdirectories:

ls -l merged
total 0
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 both_unmapped
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 empty_dir
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 gap_reads
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:32 gap_reads_alignment
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 gap_reads_for_alignment
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 gap_reads_high_quality
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 kmc_temp
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 kmers
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 temp
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 10:12 unmapped_reads
drwxrwxr-x. 1 davecarlson davecarlson 0 Dec 16 13:40 velvet_temp

Any other suggestions for things I should be changing? Thanks, Dave

davidecarlson avatar Dec 16 '20 18:12 davidecarlson

Could you check whether /home/progs/GAPPadder/ContigsMerger and /home/progs/GAPPadder/TERefiner_1 run properly? Did you compile them or directly use the one contained? On some machines, we need to re-compile them.

simoncchu avatar Dec 17 '20 00:12 simoncchu

Hi Simon,

I used the versions bundled with GAPPadder. It's a little hard to say if they're working properly. Here is the output for ContigsMerger:

Arrange error! 0 6

The output for TERefiner_1:

Please check parameters setting!

Are these the expected output when run with no input?

davidecarlson avatar Dec 18 '20 00:12 davidecarlson

Hi Simon,

I have tried to use GAPPadder and I am getting exactly the same issues (program not finishing and outputing empty directories) as brought up by Dave before in this ticket.

Below are the different infos to trace back:

script

#!/bin/bash

#SBATCH --mail-type=end,fail
#SBATCH --job-name="gap"
#SBATCH --nodes=1
#SBATCH --cpus-per-task=12
#SBATCH --time=12:00:00
#SBATCH --mem=32G
#SBATCH --partition=pall
#SBATCH --output=gap_%j.out
#SBATCH --error=gap_%j.err

module add UHTS/Aligner/bwa/0.7.17
module add UHTS/Analysis/samtools/1.10
module add UHTS/Assembler/velvet/1.2.10


# Preprocess the draft genome to get the gap positions and flank regions
python main.py -c Preprocess -g configuration.json

# Collect reads for each gap
python main.py -c Collect -g configuration.json

# Construct the gap sequence and pick the best one:
python main.py -c Assembly -g configuration.json

sdout

samtools view /path/2/align.bam "draft_name" | python collect_reads_for_gaps.py /path/2/gap_positions.txt 30 /path/2/1_is300/ 300 50 250 -
samtools view /path/2/align.bam "draft_name" | python collect_discordant_low_mapq_reads.py /path/2/1_is300/ -
First round assembly and merger...
Start merging...

sderr

Traceback (most recent call last):
  File "main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "main.py", line 257, in main_func
    drc.merge_dispatch_reads_for_gaps_v2(left_reads, right_reads)
  File "/path/2/run_multi_threads_discordant.py", line 213, in merge_dispatch_reads_for_gaps_v2
    temp_field=id_fields[0].split("/")
IndexError: list index out of range
Traceback (most recent call last):
  File "main.py", line 283, in <module>
    main_func(scommand,sfconfig)
  File "main.py", line 274, in main_func
    gap_assembler.assemble_pipeline()
  File "/path/2/assemble_gaps.py", line 339, in assemble_pipeline
    id_remain=self.pick_already_constructed(contigs_select, fa_list, sf_picked)
  File "/path/2/assemble_gaps.py", line 321, in pick_already_constructed
    m_picked=contigs_select.get_already_picked(sf_picked)
  File "/path/2/pick_contigs.py", line 576, in get_already_picked
    with open(sf_picked) as fin_picked:
IOError: [Errno 2] No such file or directory: u'/path/2/merged/../picked_seqs.fa'

configuration.json

    "draft_genome": {
        "fa": "/path/2/draft.fasta"
    },
    "raw_reads": [
		    {
            "left": "/path/2/reads_1.fastq.gz",
	    "right": "/path/2/reads_2.fastq.gz"
        }
	  ],
    "alignments": [
		    {
            "bam": "/path/2/align.bam",
		        "is": "300",
		        "std": "50"
        }
	  ],
    "software_path": {
        "bwa": "bwa",
        "samtools": "samtools",
        "velvet": "velvet",
        "kmc": "/path/2/KMC/bin/",
        "TERefiner": "/path/2/TERefiner_1",
        "ContigsMerger": "/path/2/ContigsMerger"
    },
    "parameters": {
    		"working_folder": "/path/2/dir",
        "min_gap_size": "2",
        "flank_length": "300",
        "nthreads": "12",
        "verbose": "1"
		},
		"kmer_length": [{
						"k": 30,
						"k_velvet": [{
								"k": 29
						}, 
						{
								"k": 27
						}]
				}, 
				{
						"k": 40,
						"k_velvet": [{
								"k": 39
						}, 
						{
								"k": 37
						}]
				},
				{
						"k": 50,
						"k_velvet": [{
								"k": 49
						}, 
						{
								"k": 47
						}]
				}]
}

Would you have an idea of what is happening?

Best

frihaka avatar Feb 26 '21 16:02 frihaka

Hello Simon,

I have tried to use GAPPadder as well and I have the same issues (program not finishing and output directories empty) as mentioned above. I have tried to recompile ContigsMerger and TERefiner_1, but it didn't change anything.

Do you have an idea of what is going wrong ?

Thanks, Anne

anne-gcd avatar Nov 09 '21 12:11 anne-gcd