bedtools2 icon indicating copy to clipboard operation
bedtools2 copied to clipboard

request for 'skip missing values' option in map

Open Blosberg opened this issue 3 years ago • 1 comments

Thanks very much for contributing this excellent resource to the community. I've been using both unionbedg and map utilities, and I think both of these things would be much more useful if there was an additional possibility to skip missing values.

To explain: suppose I have the following (simplified) ranges file: a_range.bed

chr1	0	10
chr1	10	20
chr1	20	30

And suppose I've already taken three samples of methylation data (for example) and combined them using bedtools unionbedg -i sample_1.bed sample_2.bed sample_3.bed -filler NA > b_union.bed into the following:

chr1	11	12	0.3   0.1  NA
chr1	14	15	0.2   0.2  NA 
chr1	17	18	NA  0.5  0.9
chr1	23	25	0.5  NA  0.1

Note that every row has at least one numeric value in columns 4-6, but each sample is missing some data in loci where others have coverage. I then tried to average over the ranges in a_range.bed for each of the columns by doing this: bedtools map -a a_range.bed -b b_union.bed -c 4,5,6 -o mean. Unfortunately, this produces the error ***** WARNING: Non numeric value NA. If I removed lines with NA in them with grep -v NA then I would lose the whole line each time (and in the above example, there would be no lines left).

It would be ideal if I could add an option like -omit NA to the map command to tell bedtools to skip specific strings within the mapping procedure for each column independently (or skip non-numeric values entirely), while still preserving the row for other columns that have valid data. Then I could batch together all kinds of operations on different samples in parallel and the union file would be much more versatile.

Blosberg avatar May 05 '21 08:05 Blosberg

This is a great idea. I will look into it.

arq5x avatar May 25 '21 16:05 arq5x