miniasm icon indicating copy to clipboard operation
miniasm copied to clipboard

Next Step after generating assembly.gfa file.

Open ufaroooq opened this issue 1 year ago • 4 comments

Hello.

Can anyone help.guide me regarding what to do next after generating the .gfa file. I have a Pacbio dataset and used following commands to generate .gfa file.

minimap2 -x ava-pb -t 32 longread.fastq.gz longread.fastq.gz | gzip -1 > reads.paf.gz miniasm -f longread.fastq.gz miniasm/reads.paf.gz > miniasm/assembly.gfa

  1. Can you share information about the structure of .gfa file ? what doeas each column represent ?
  • These is 1st row that starts with "S" and the a label followed by a long sequence
  • Then these are some corresponding lines that start with "a" followed by same label

example: (Sequence is cropped just to show here)

S utg000002l GCCATATCCTTGAGGAGATCGTTCAGCGCGCAGAACCGAAAACTGTAT LN:i:87496 a utg000002l 0 SRR9694937.41145:1-8573 - 673

  1. I know gfa can be visualized in Bandage but how to get the fasta assembly file for further downstram analysis like polishing.

ufaroooq avatar Jan 29 '24 12:01 ufaroooq

there are GFA file Specification, maybe could help. If you want to do further downstram analysis, you need extract S line sequence , like awk '/^S/{print ">""\n"}' ONTmin.gfa | fold > ONTmin_IT0.fasta.

I also have some question, the Specification don't have the detail annotion for GFA file . lines that start with "a" , what is "a" mean , and utg000002l column in my result there are have "l" or "c" end ,what' mean, and in "a" line , what is each column represent .

I guss "a" is a tag , but for "a" line , not have explanation。

I would like to select high-quality sequences that are more conducive to assembly based on the alignment results from miniasm, so I need a detailed understanding of the GFA file. However, I am encountering many problems now. If someone has done similar work, I hope to receive your help. Thank you.

zhaolei6116 avatar Nov 21 '24 00:11 zhaolei6116

from #41 i got the mean of the "l" or "c" at the end of a contig name.

c means circular. l means linear.

zhaolei6116 avatar Nov 21 '24 01:11 zhaolei6116

@zhaolei6116 Thank you for explaining.

ufaroooq avatar Nov 21 '24 15:11 ufaroooq

Get some infomation about 'a' line and 'x' line, in #71 , and https://manpages.debian.org/testing/miniasm/miniasm.1.en.html

zhaolei6116 avatar Nov 22 '24 01:11 zhaolei6116