minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

Can't find some sequence of mitochondrial gene

Open JuFengWang opened this issue 4 years ago • 2 comments

Hello, I am new to de novo assembly. I use minimap2, miniasm, racon and pilon to assembly genome(about 30M) but in the finally file I can't find the sequence of minicircle which is 600bp-3kb in mitochondrial genome. Can you give some advice? My command is following: Raw read overlap detection (minimap2) minimap2 -x ava-pb -t8 T_mus_0.fq T_mus_0.fq | gzip -1 >T_mus_1.paf.gz

OLC-based de novo assembly (miniasm) miniasm -f T_mus_0.fq T_mus_1.paf.gz > T_mus_2.gfa

GFA-to-Fasta conversion awk '/^S/{print ">"$2"\n"$3}' T_mus_2.gfa | seqkit seq > T_mus_3.fasta

Long Read Polishing Long read remapping - Iteration 1 (minimap2) minimap2 T_mus_3.fasta T_mus_0.fq > T_mus_4.paf

Long read consensus call - Iteration 1 (racon) racon -t 4 T_mus_0.fq T_mus_4.paf T_mus_3.fasta T_mus_5.fasta

Long read remapping - Iteration 2 (minimap2) minimap2 T_mus_5.fasta T_mus_0.fq > T_mus_6.paf

Long read consensus call - Iteration 2 (racon) racon -t 4 T_mus_0.fq T_mus_6.paf T_mus_5.fasta T_mus_7.fasta

Long read remapping - Iteration 3 (minimap2) minimap2 T_mus_7.fasta T_mus_0.fq > T_mus_8.paf

Long read consensus call - Iteration 3 (racon) racon -t 4 T_mus_0.fq T_mus_8.paf T_mus_7.fasta T_mus_9.fasta

Short Read Polishing Read trimming(fastp) fastp -q 30 -5 -l 100 -i L7_1_clean.fq -I L7_2_clean.fq -o i1_clean_1.fq -O i1_clean_2.fq

BWA genome indexing & short read remapping(BWA samtools) bwa index T_mus_9.fasta bwa mem -t 8 T_mus_9.fasta i1_clean_1.fq i1_clean_2.fq | samtools sort -@ 8 -O bam -o T_mus_10.bam samtools index -@ 8 T_mus_10.bam

Short read consensus call - Iteration 1 (pilon) java -Xmx16G -jar pilon-1.23.jar --genome T_mus_9.fasta --frags T_mus_10.bam --fix snps --output T_mus_11

JuFengWang avatar Mar 29 '20 02:03 JuFengWang

Tip: use gfak extract instead of your awk script.

tseemann avatar Mar 30 '20 00:03 tseemann

Thanks for your advice, but it can't solve my problem. When I try the following command: miniasm -f T_musculi_0.fq T_musculi_1.paf.gz > T_musculi_2.gfa -s 200 it still failed with about 1/3 minicircles and they exist in the raw data(T_musculi_0.fq ).

JuFengWang avatar Mar 30 '20 01:03 JuFengWang