bedtools2
bedtools2 copied to clipboard
intersect using split creates weird behavior in v2.30 compared to v2.28
Hi,
I observed some weird behavior when I use bedtools intersect using split and single-exon intervals in v2.30. It seems this de-activates the function of -f (overlaps of length 5) [maybe similar to #773 ]. In addition there is some problem that it misses quite some overlaps if split is turned on.
input A
2L 11541138 11541166 1 1 + 11541138 11541166 255,0,0 1 28 0
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0
2R 3512790 3512818 3 1 - 3512790 3512818 255,0,0 1 28 0
2R 3890488 3890516 4 1 - 3890488 3890516 255,0,0 1 28 0
Un_CP007073v1 4644 4672 5 1 - 4644 4672 255,0,0 1 28 0
Un_CP007080v1 63195 63223 6 1 + 63195 63223 255,0,0 1 28 0
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0
X 21807157 21807185 8 1 + 21807157 21807185 255,0,0 1 28 0
input B I could not get a minimal example that replicates the behavior 1:1. I attached you the full bed file below.
result 2.28 with split
bedtools intersect -nobuf -wao -f 0.51 -split -a A.bed -b B.bed
2L 11541138 11541166 1 1 + 11541138 11541166 255,0,0 1 28 0 2L 11541141 11541213 TE::TE:LTR 0 - 25
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0 2R 2163108 2169452 TE::TE:LTR 0 + 25
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0 2R 2119244 2245971 mRNA::EX:intron 0 - 28
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0 2R 2119244 2261828 mRNA::EX:intron 0 - 28
2R 3512790 3512818 3 1 - 3512790 3512818 255,0,0 1 28 0 2R 3506591 3512815 TE::TE:LTR 0 + 25
2R 3890488 3890516 4 1 - 3890488 3890516 255,0,0 1 28 0 2R 3884301 3890513 TE::TE:LTR 0 + 25
Un_CP007073v1 4644 4672 5 1 - 4644 4672 255,0,0 1 28 0 Un_CP007073v1 4577 4669 TE::TE:LTR 0 + 25
Un_CP007080v1 63195 63223 6 1 + 63195 63223 255,0,0 1 28 0 Un_CP007080v1 63198 63356 TE::TE:LTR 0 - 25
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0 X 7023094 7031834 mRNA::EX:intron 0 - 28
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0 X 7023094 7031834 mRNA::EX:intron 0 - 28
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0 X 7024160 7030484 TE::TE:LTR 0 + 25
X 21807157 21807185 8 1 + 21807157 21807185 255,0,0 1 28 0 X 21807160 21807319 TE::TE:LTR 0 - 25
result 2.30 with split
bedtools intersect -nobuf -wao -f 0.51 -split -a A.bed -b B.bed
2L 11541138 11541166 1 1 + 11541138 11541166 255,0,0 1 28 0 2L 11540637 11541143 TE::TE:LTR 0 - 5
2L 11541138 11541166 1 1 + 11541138 11541166 255,0,0 1 28 0 2L 11541141 11541213 TE::TE:LTR 0 - 25
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0 . -1 -1 . -1 . 0
2R 3512790 3512818 3 1 - 3512790 3512818 255,0,0 1 28 0 . -1 -1 . -1 . 0
2R 3890488 3890516 4 1 - 3890488 3890516 255,0,0 1 28 0 . -1 -1 . -1 . 0
Un_CP007073v1 4644 4672 5 1 - 4644 4672 255,0,0 1 28 0 . -1 -1 . -1 . 0
Un_CP007080v1 63195 63223 6 1 + 63195 63223 255,0,0 1 28 0 Un_CP007080v1 62694 63200 TE::TE:LTR 0 - 5
Un_CP007080v1 63195 63223 6 1 + 63195 63223 255,0,0 1 28 0 Un_CP007080v1 63198 63356 TE::TE:LTR 0 - 25
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0 . -1 -1 . -1 . 0
X 21807157 21807185 8 1 + 21807157 21807185 255,0,0 1 28 0 X 21807160 21807319 TE::TE:LTR 0 - 25
X 21807157 21807185 8 1 + 21807157 21807185 255,0,0 1 28 0 X 21806656 21807162 TE::TE:LTR 0 - 5
result 2.30 without split
bedtools intersect -nobuf -wao -f 0.51 -a A.bed -b B.bed
2L 11541138 11541166 1 1 + 11541138 11541166 255,0,0 1 28 0 2L 11541141 11541213 TE::TE:LTR 0 - 25
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0 2R 2163108 2169452 TE::TE:LTR 0 + 25
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0 2R 2119244 2245971 mRNA::EX:intron 0 - 28
2R 2169427 2169455 2 1 - 2169427 2169455 255,0,0 1 28 0 2R 2119244 2261828 mRNA::EX:intron 0 - 28
2R 3512790 3512818 3 1 - 3512790 3512818 255,0,0 1 28 0 2R 3506591 3512815 TE::TE:LTR 0 + 25
2R 3890488 3890516 4 1 - 3890488 3890516 255,0,0 1 28 0 2R 3884301 3890513 TE::TE:LTR 0 + 25
Un_CP007073v1 4644 4672 5 1 - 4644 4672 255,0,0 1 28 0 Un_CP007073v1 4577 4669 TE::TE:LTR 0 + 25
Un_CP007080v1 63195 63223 6 1 + 63195 63223 255,0,0 1 28 0 Un_CP007080v1 63198 63356 TE::TE:LTR 0 - 25
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0 X 7023094 7031834 mRNA::EX:intron 0 - 28
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0 X 7023094 7031834 mRNA::EX:intron 0 - 28
X 7030459 7030487 7 1 - 7030459 7030487 255,0,0 1 28 0 X 7024160 7030484 TE::TE:LTR 0 + 25
X 21807157 21807185 8 1 + 21807157 21807185 255,0,0 1 28 0 X 21807160 21807319 TE::TE:LTR 0 - 25
Thank you, Dominik
@dominik-handler I am sorry it has taken me so long to respond to this. This is indeed a bug and I have found the source of the problem. However, in so doing I am realizing that this requires some thought about how best to handle fractional overlap when using -split
as there are some interesting corner cases where the appropriate behavior is not entirely clear (at least to me). I need to spend some time thinking about and modeling all of the cases. Hoping to track the progress in this issue over the coming weeks.