minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

Get BED of uncovered regions of ref using paftools?

Open tseemann opened this issue 1 year ago • 2 comments

Givem a PAF file of one genome aligned to a reference, I can get overall coverage percentages with paftools asmstat, but what I would like is a BED file of those regions in the reference genome to which nothing was aligned, or even better, these uncovered regions of the reference somehow denoted in the VCF file from paftools call.

Any pointers appreciated.

tseemann avatar Apr 16 '23 05:04 tseemann

You may try paftools.js call without -f. The R-lines give regions where variants can be called confidently.

If you want to extract regions covered by any alignments, you may

cat aln.paf | awk '/tp:A:[PI]/&&$11>=10000' | cut -f6,8,9 | bedtk merge - > merged.bed

It is recommended to filter poor alignments (e.g. $11>=10000). By default, minimap2 may output many short bad alignments.

lh3 avatar Apr 20 '23 01:04 lh3

I did not notice the different output mode of paftools when no reference is provided.

For others reading this, here is the docs for that: https://github.com/lh3/minimap2/blob/master/misc/README.md#calling-variants-from-haploid-assemblies

As for the tp:A tag - are all secondary alignments tp:A:S always get MAPQ=0 ?

For those who want to see what tags are in the --cs output: https://lh3.github.io/minimap2/minimap2.html#10

My question about UNALIGNED regions still needs more work. I ended up making a BED for the chromosomes themselves, and then using bedtk sub chr.bed aligned.bed to find the inverse. This was needed for bcftools consensus --mark-del '-' --mask unaligned.bed

tseemann avatar May 13 '23 04:05 tseemann