Next Step after generating assembly.gfa file.
Hello.
Can anyone help.guide me regarding what to do next after generating the .gfa file. I have a Pacbio dataset and used following commands to generate .gfa file.
minimap2 -x ava-pb -t 32 longread.fastq.gz longread.fastq.gz | gzip -1 > reads.paf.gz miniasm -f longread.fastq.gz miniasm/reads.paf.gz > miniasm/assembly.gfa
- Can you share information about the structure of .gfa file ? what doeas each column represent ?
- These is 1st row that starts with "S" and the a label followed by a long sequence
- Then these are some corresponding lines that start with "a" followed by same label
example: (Sequence is cropped just to show here)
S utg000002l GCCATATCCTTGAGGAGATCGTTCAGCGCGCAGAACCGAAAACTGTAT LN:i:87496 a utg000002l 0 SRR9694937.41145:1-8573 - 673
- I know gfa can be visualized in Bandage but how to get the fasta assembly file for further downstram analysis like polishing.
there are GFA file Specification, maybe could help. If you want to do further downstram analysis, you need extract S line sequence , like
awk '/^S/{print ">""\n"}' ONTmin.gfa | fold > ONTmin_IT0.fasta.
I also have some question, the Specification don't have the detail annotion for GFA file . lines that start with "a" , what is "a" mean , and utg000002l column in my result there are have "l" or "c" end ,what' mean, and in "a" line , what is each column represent .
I guss "a" is a tag , but for "a" line , not have explanation。
I would like to select high-quality sequences that are more conducive to assembly based on the alignment results from miniasm, so I need a detailed understanding of the GFA file. However, I am encountering many problems now. If someone has done similar work, I hope to receive your help. Thank you.
from #41 i got the mean of the "l" or "c" at the end of a contig name.
c means circular. l means linear.
@zhaolei6116 Thank you for explaining.
Get some infomation about 'a' line and 'x' line, in #71 , and https://manpages.debian.org/testing/miniasm/miniasm.1.en.html