bedtools icon indicating copy to clipboard operation
bedtools copied to clipboard

ENH: implement new tool bedtools sample

Open brentp opened this issue 11 years ago • 0 comments

This would be useful for stuff related to significance of overlap (along with shuffle).

this would have a command-line interface like:

bedtools sample -i stdin -n 500 > out.500.bed

This could be done with sort -R or nl, or something, but those tools are not on most systems that I work with.

This is very easy to implement with reservoir sampling, here;s a python version mostly taken from wikipedia:

    import random
    lines = []
    for i, line in enumerate(bed):
        if i < n:
            lines.append(line)
        else:
            replace_idx = random.randint(0, i)
            if replace_idx < n:
                lines[replace_idx] = line
    print "".join(lines),

Below may be a completely separate tool, but something that would be useful to add to a suite like this in bedtools....

it may also be nice to do shuffling of labels, e.g.

bedtools label_shuffle -col 4 -i stdin

so instead of shuffling positions, we shuffle the labels of the appropriate column. Again, this can be useful for significance testing.

brentp avatar May 30 '13 15:05 brentp