bedtools2 icon indicating copy to clipboard operation
bedtools2 copied to clipboard

Bedtools Intersect — Confusing Behavior when -f -r -v -wa and -wb Flags Used in Various Combinations

Open CharlesARoy opened this issue 3 years ago • 1 comments

Hello,

The Bedtools Intersect documentation states that the -wa -wb and -v flags are "Restricted by -f and -r" but doesn't seem to elaborate on how they are restricted. (An example or two of how these flags work in combination and how they restrict each other would be helpful). The documentation also mentions that the -v flag is "an homage to “grep -v”". However, in grep the -v flag essentially acts as a NOT, but in bedtools, the -v flag doesn't seem to consistently work as a NOT.

For example, I often run into situations where I want only the original a reads or only the original b reads when regions in two bed files either do or do not have some fraction of (reciprocal) overlap.

The way that would seem logical to state these sorts of commands would be to write something like: bedtools intersect -f 0.5 -r -wb -a a_file.bed -b b_file.bed > output Where I would only want the original b_file.bed regions that have 50% reciprocal overlap with the a_file.bed regions. Or: bedtools intersect -f 0.8 -r -v -wb -a a_file.bed -b b_file.bed > output Where I would only want the original b_file.bed regions that do not have 80% reciprocal overlap with the a_file.bed regions. Or: bedtools intersect -f 0.5 -v -wb -a a_file.bed -b b_file.bed > output Where I would only want the original b_file.bed regions when an a_file.bed region does not overlap the b_file.bed region by 50% or more. For the above command, one might be tempted to suggest swapping the files and writing something like the following, but then you only get back the b_file.bed regions that don't overlap the a_file.bed regions, which isn't what I want. bedtools intersect -f 0.5 -v -wa -a b_file.bed -b a_file.bed > output

However, when I use the -v flag with -wa and -wb or even just -wb, all I get back are the original a_file.bed regions, which isn't helpful and sort of seems like a bug, or at least, an area that can be improved. I know that the documentation states that "...if just -wb is used, the overlapping portion of A will be reported followed by the original “B”" but I would suggest making the the -wb flag more consistent with the -wa flag. (I know that I can easily pipe to awk or cut to get just the original b_file.bed regions, but it would still be nice to have a dedicated command).

More importantly; however, I think it would be really helpful to have the -v flag work as negation for all commands. In other words, it would be nice if adding the -v flag would always produce the set for which the condition specified without the -v flag does not hold.

I am currently using Bedtools version 2.29.2.

CharlesARoy avatar Sep 22 '20 05:09 CharlesARoy

Thank you for mentioning here that something is weird with bedtools intersect -v command. I would like to did try to use that as "grep -v" but it also messes up the bam header.

For example, a bam file header that supposed to look like this

samtools view -h input.bam | head
@HD     VN:1.6  SO:coordinate
@SQ     SN:1    LN:158534110
@SQ     SN:2    LN:136231102
@SQ     SN:3    LN:121005158
@SQ     SN:4    LN:120000601
@SQ     SN:5    LN:120089316
@SQ     SN:6    LN:117806340
@SQ     SN:7    LN:110682743
@SQ     SN:8    LN:113319770
@SQ     SN:9    LN:105454467

After going through this... bedtools intersect -abam input.bam -b unwanted_positon.bed -v > clean_input.bam

... the header looks like this

samtools view -h clean_input.bam | head
@SQ     SN:1    LN:158534110
@SQ     SN:2    LN:136231102
@SQ     SN:3    LN:121005158
@SQ     SN:4    LN:120000601
@SQ     SN:5    LN:120089316
@SQ     SN:6    LN:117806340
@SQ     SN:7    LN:110682743
@SQ     SN:8    LN:113319770
@SQ     SN:9    LN:105454467
@SQ     SN:10   LN:103308737

sagitaninta avatar Oct 21 '20 15:10 sagitaninta