minimap2
minimap2 copied to clipboard
Can't find some sequence of mitochondrial gene
Hello, I am new to de novo assembly. I use minimap2, miniasm, racon and pilon to assembly genome(about 30M) but in the finally file I can't find the sequence of minicircle which is 600bp-3kb in mitochondrial genome. Can you give some advice?
My command is following:
Raw read overlap detection (minimap2)
minimap2 -x ava-pb -t8 T_mus_0.fq T_mus_0.fq | gzip -1 >T_mus_1.paf.gz
OLC-based de novo assembly (miniasm)
miniasm -f T_mus_0.fq T_mus_1.paf.gz > T_mus_2.gfa
GFA-to-Fasta conversion
awk '/^S/{print ">"$2"\n"$3}' T_mus_2.gfa | seqkit seq > T_mus_3.fasta
Long Read Polishing
Long read remapping - Iteration 1 (minimap2)
minimap2 T_mus_3.fasta T_mus_0.fq > T_mus_4.paf
Long read consensus call - Iteration 1 (racon)
racon -t 4 T_mus_0.fq T_mus_4.paf T_mus_3.fasta T_mus_5.fasta
Long read remapping - Iteration 2 (minimap2)
minimap2 T_mus_5.fasta T_mus_0.fq > T_mus_6.paf
Long read consensus call - Iteration 2 (racon)
racon -t 4 T_mus_0.fq T_mus_6.paf T_mus_5.fasta T_mus_7.fasta
Long read remapping - Iteration 3 (minimap2)
minimap2 T_mus_7.fasta T_mus_0.fq > T_mus_8.paf
Long read consensus call - Iteration 3 (racon)
racon -t 4 T_mus_0.fq T_mus_8.paf T_mus_7.fasta T_mus_9.fasta
Short Read Polishing
Read trimming(fastp)
fastp -q 30 -5 -l 100 -i L7_1_clean.fq -I L7_2_clean.fq -o i1_clean_1.fq -O i1_clean_2.fq
BWA genome indexing & short read remapping(BWA samtools)
bwa index T_mus_9.fasta
bwa mem -t 8 T_mus_9.fasta i1_clean_1.fq i1_clean_2.fq | samtools sort -@ 8 -O bam -o T_mus_10.bam
samtools index -@ 8 T_mus_10.bam
Short read consensus call - Iteration 1 (pilon)
java -Xmx16G -jar pilon-1.23.jar --genome T_mus_9.fasta --frags T_mus_10.bam --fix snps --output T_mus_11
Tip: use gfak extract
instead of your awk
script.
Thanks for your advice, but it can't solve my problem. When I try the following command:
miniasm -f T_musculi_0.fq T_musculi_1.paf.gz > T_musculi_2.gfa -s 200
it still failed with about 1/3 minicircles and they exist in the raw data(T_musculi_0.fq ).