dipcall icon indicating copy to clipboard operation
dipcall copied to clipboard

Interpretation of bed files

Open cjain7 opened this issue 2 years ago • 1 comments

$ cat prefix.dip.bed | awk 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}'
2823519412
$ cat prefix.hap1.bed | awk 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}'
2690214366
$ cat prefix.hap2.bed | awk 'BEGIN{SUM=0}{ SUM+=$3-$2 }END{print SUM}'
2817991873

As per the documentation: The prefix.dip.bed file gives the confident regions. A base is included in the BED if 1) it is covered by one >=50kb alignment with mapQ>=5 from each parent and 2) it is not covered by other >=10kb alignments in each parent. Based on this, shouldn't the length of intervals in prefix.dip.bed be lower than prefix.hap1.bed and prefix.hap2.bed, i.e., should prefix.dip.bed have been intersection of the two haplotype-specific bed files?

Please suggest what is the relationship between these three bed files.

cjain7 avatar Aug 26 '22 11:08 cjain7

This is probably caused by the sex chromosomes. chrX and chrY are handled differently.

lh3 avatar Aug 26 '22 13:08 lh3