bedtools2
bedtools2 copied to clipboard
request for 'skip missing values' option in map
Thanks very much for contributing this excellent resource to the community.
I've been using both unionbedg
and map
utilities, and I think both of these things would be much more useful if there was an additional possibility to skip missing values.
To explain: suppose I have the following (simplified) ranges file: a_range.bed
chr1 0 10
chr1 10 20
chr1 20 30
And suppose I've already taken three samples of methylation data (for example) and combined them using bedtools unionbedg -i sample_1.bed sample_2.bed sample_3.bed -filler NA > b_union.bed
into the following:
chr1 11 12 0.3 0.1 NA
chr1 14 15 0.2 0.2 NA
chr1 17 18 NA 0.5 0.9
chr1 23 25 0.5 NA 0.1
Note that every row has at least one numeric value in columns 4-6, but each sample is missing some data in loci where others have coverage. I then tried to average over the ranges in a_range.bed
for each of the columns by doing this: bedtools map -a a_range.bed -b b_union.bed -c 4,5,6 -o mean
. Unfortunately, this produces the error ***** WARNING: Non numeric value NA
. If I removed lines with NA in them with grep -v NA
then I would lose the whole line each time (and in the above example, there would be no lines left).
It would be ideal if I could add an option like -omit NA
to the map command to tell bedtools to skip specific strings within the mapping procedure for each column independently (or skip non-numeric values entirely), while still preserving the row for other columns that have valid data. Then I could batch together all kinds of operations on different samples in parallel and the union file would be much more versatile.
This is a great idea. I will look into it.