bedtools2 icon indicating copy to clipboard operation
bedtools2 copied to clipboard

intersect using split creates weird behavior in v2.30 compared to v2.28

Open dominik-handler opened this issue 3 years ago • 1 comments

Hi,

I observed some weird behavior when I use bedtools intersect using split and single-exon intervals in v2.30. It seems this de-activates the function of -f (overlaps of length 5) [maybe similar to #773 ]. In addition there is some problem that it misses quite some overlaps if split is turned on.

input A

2L      11541138        11541166        1       1       +       11541138        11541166        255,0,0 1       28      0
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0
2R      3512790 3512818 3       1       -       3512790 3512818 255,0,0 1       28      0
2R      3890488 3890516 4       1       -       3890488 3890516 255,0,0 1       28      0
Un_CP007073v1   4644    4672    5       1       -       4644    4672    255,0,0 1       28      0
Un_CP007080v1   63195   63223   6       1       +       63195   63223   255,0,0 1       28      0
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0
X       21807157        21807185        8       1       +       21807157        21807185        255,0,0 1       28      0

input B I could not get a minimal example that replicates the behavior 1:1. I attached you the full bed file below.

result 2.28 with split

bedtools intersect -nobuf -wao -f 0.51 -split -a A.bed -b B.bed
2L      11541138        11541166        1       1       +       11541138        11541166        255,0,0 1       28      0       2L      11541141        11541213        TE::TE:LTR      0       -       25
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0       2R      2163108 2169452 TE::TE:LTR      0       +       25
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0       2R      2119244 2245971 mRNA::EX:intron 0       -       28
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0       2R      2119244 2261828 mRNA::EX:intron 0       -       28
2R      3512790 3512818 3       1       -       3512790 3512818 255,0,0 1       28      0       2R      3506591 3512815 TE::TE:LTR      0       +       25
2R      3890488 3890516 4       1       -       3890488 3890516 255,0,0 1       28      0       2R      3884301 3890513 TE::TE:LTR      0       +       25
Un_CP007073v1   4644    4672    5       1       -       4644    4672    255,0,0 1       28      0       Un_CP007073v1   4577    4669    TE::TE:LTR      0       +       25
Un_CP007080v1   63195   63223   6       1       +       63195   63223   255,0,0 1       28      0       Un_CP007080v1   63198   63356   TE::TE:LTR      0       -       25
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0       X       7023094 7031834 mRNA::EX:intron 0       -       28
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0       X       7023094 7031834 mRNA::EX:intron 0       -       28
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0       X       7024160 7030484 TE::TE:LTR      0       +       25
X       21807157        21807185        8       1       +       21807157        21807185        255,0,0 1       28      0       X       21807160        21807319        TE::TE:LTR      0       -       25

result 2.30 with split

bedtools intersect -nobuf -wao -f 0.51 -split -a A.bed -b B.bed
2L      11541138        11541166        1       1       +       11541138        11541166        255,0,0 1       28      0       2L      11540637        11541143        TE::TE:LTR      0       -       5
2L      11541138        11541166        1       1       +       11541138        11541166        255,0,0 1       28      0       2L      11541141        11541213        TE::TE:LTR      0       -       25
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0       .       -1      -1      .       -1      .       0
2R      3512790 3512818 3       1       -       3512790 3512818 255,0,0 1       28      0       .       -1      -1      .       -1      .       0
2R      3890488 3890516 4       1       -       3890488 3890516 255,0,0 1       28      0       .       -1      -1      .       -1      .       0
Un_CP007073v1   4644    4672    5       1       -       4644    4672    255,0,0 1       28      0       .       -1      -1      .       -1      .       0
Un_CP007080v1   63195   63223   6       1       +       63195   63223   255,0,0 1       28      0       Un_CP007080v1   62694   63200   TE::TE:LTR      0       -       5
Un_CP007080v1   63195   63223   6       1       +       63195   63223   255,0,0 1       28      0       Un_CP007080v1   63198   63356   TE::TE:LTR      0       -       25
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0       .       -1      -1      .       -1      .       0
X       21807157        21807185        8       1       +       21807157        21807185        255,0,0 1       28      0       X       21807160        21807319        TE::TE:LTR      0       -       25
X       21807157        21807185        8       1       +       21807157        21807185        255,0,0 1       28      0       X       21806656        21807162        TE::TE:LTR      0       -       5

result 2.30 without split

bedtools intersect -nobuf -wao -f 0.51 -a A.bed -b B.bed
2L      11541138        11541166        1       1       +       11541138        11541166        255,0,0 1       28      0       2L      11541141        11541213        TE::TE:LTR      0       -       25
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0       2R      2163108 2169452 TE::TE:LTR      0       +       25
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0       2R      2119244 2245971 mRNA::EX:intron 0       -       28
2R      2169427 2169455 2       1       -       2169427 2169455 255,0,0 1       28      0       2R      2119244 2261828 mRNA::EX:intron 0       -       28
2R      3512790 3512818 3       1       -       3512790 3512818 255,0,0 1       28      0       2R      3506591 3512815 TE::TE:LTR      0       +       25
2R      3890488 3890516 4       1       -       3890488 3890516 255,0,0 1       28      0       2R      3884301 3890513 TE::TE:LTR      0       +       25
Un_CP007073v1   4644    4672    5       1       -       4644    4672    255,0,0 1       28      0       Un_CP007073v1   4577    4669    TE::TE:LTR      0       +       25
Un_CP007080v1   63195   63223   6       1       +       63195   63223   255,0,0 1       28      0       Un_CP007080v1   63198   63356   TE::TE:LTR      0       -       25
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0       X       7023094 7031834 mRNA::EX:intron 0       -       28
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0       X       7023094 7031834 mRNA::EX:intron 0       -       28
X       7030459 7030487 7       1       -       7030459 7030487 255,0,0 1       28      0       X       7024160 7030484 TE::TE:LTR      0       +       25
X       21807157        21807185        8       1       +       21807157        21807185        255,0,0 1       28      0       X       21807160        21807319        TE::TE:LTR      0       -       25

B.bed.zip

Thank you, Dominik

dominik-handler avatar Jul 01 '21 08:07 dominik-handler

@dominik-handler I am sorry it has taken me so long to respond to this. This is indeed a bug and I have found the source of the problem. However, in so doing I am realizing that this requires some thought about how best to handle fractional overlap when using -split as there are some interesting corner cases where the appropriate behavior is not entirely clear (at least to me). I need to spend some time thinking about and modeling all of the cases. Hoping to track the progress in this issue over the coming weeks.

arq5x avatar Aug 03 '21 15:08 arq5x