pybedtools icon indicating copy to clipboard operation
pybedtools copied to clipboard

does pybedtools closest support multiple databases like bedtools?

Open xguse opened this issue 9 years ago • 3 comments

I am trying to run something like this:

$ cat a.bed
chr1  10  20  a1  1 -

$ cat b1.bed
chr1  5   6   b1.1  1 -
chr1  30  40  b1.2  2 +

$ cat b2.bed
chr1  0   1   b2.1  1 -
chr1  21  22  b2.2  2 +



$ bedtools closest -a a.bed -b b1.bed b2.bed -mdb each -d
chr1  10  20  a1  1 - 1 chr1  5   6   b1.1  1 - 5
chr1  10  20  a1  1 - 2 chr1  21  22  b2.2  2 + 2

This is what I am doing:

k_nearest = snp_bed.closest([gene_model_subtracted_bed, genes_only_sorted_bed],
                                k=k_number,
                                names=['novel_mapped_tx', 'official_annotations'],
                                D='ref',    # Include SIGNED distances from SNP based on the ref genome
                                t='all',    # Return all members of a distance "tie"
                                mdb='each', # Return `k_number` of neighboors for EACH `names`
                                )

This is the error I am getting:

pybedtools.helpers.BEDToolsError:
Command was:

        bedtools closest -t all -names novel_mapped_tx official_annotations -mdb each -k 10 -b /tmp/pybedtools.eieijdz9.tmp -a snp_bed.bed -D ref

Error message was:

***** ERROR: Number of database name tags given does not match number of databases. *****

Is it possible to give multiple "B" files in pybedtools? It seems to enforce that only a single *arg is passed.

Any help would be awesome!

Gus

PS: I also tried just passing in a string with the two file paths, but it tried to open them as if it were a single path declaration which borks of course.

xguse avatar Mar 11 '16 22:03 xguse

As of v0.7.5 this works (also see #156) . . . but only if the list contains filename strings rather than BedTool objects. I will fix this so it detects BedTool objects as well, and while I'm working on it, might as well support mixes of BedTool objects and string filenames.

In the meantime, try modifying your example to:

k_nearest = snp_bed.closest(
                               [
                                   gene_model_subtracted_bed.fn,
                                   genes_only_sorted_bed.fn
                                ],
                                k=k_number,
                                names=['novel_mapped_tx', 'official_annotations'],
                                D='ref',    # Include SIGNED distances from SNP based on the ref genome
                                t='all',    # Return all members of a distance "tie"
                                mdb='each', # Return `k_number` of neighboors for EACH `names`
                                )

I'll keep this issue open until I add support for lists of BedTool objects.

daler avatar Mar 11 '16 23:03 daler

You rock.

Thanks as always for the speedy and helpful response.

Gus

xguse avatar Mar 11 '16 23:03 xguse

I second on this feature request.

SHuang-Broad avatar Jun 12 '19 17:06 SHuang-Broad