minimap2
minimap2 copied to clipboard
Get BED of uncovered regions of ref using paftools?
Givem a PAF file of one genome aligned to a reference, I can get overall coverage percentages with paftools asmstat
, but what I would like is a BED file of those regions in the reference genome to which nothing was aligned, or even better, these uncovered regions of the reference somehow denoted in the VCF file from paftools call
.
Any pointers appreciated.
You may try paftools.js call without -f
. The R
-lines give regions where variants can be called confidently.
If you want to extract regions covered by any alignments, you may
cat aln.paf | awk '/tp:A:[PI]/&&$11>=10000' | cut -f6,8,9 | bedtk merge - > merged.bed
It is recommended to filter poor alignments (e.g. $11>=10000
). By default, minimap2 may output many short bad alignments.
I did not notice the different output mode of paftools when no reference is provided.
For others reading this, here is the docs for that: https://github.com/lh3/minimap2/blob/master/misc/README.md#calling-variants-from-haploid-assemblies
As for the tp:A
tag - are all secondary alignments tp:A:S
always get MAPQ=0 ?
For those who want to see what tags are in the --cs output: https://lh3.github.io/minimap2/minimap2.html#10
My question about UNALIGNED regions still needs more work.
I ended up making a BED for the chromosomes themselves, and then using bedtk sub chr.bed aligned.bed
to find the inverse. This was needed for bcftools consensus --mark-del '-' --mask unaligned.bed