biomartpy
biomartpy copied to clipboard
Simple interface to BioMart (Python -> rpy2 -> R/BioConductor's biomaRt)
biomartpy
Simple interface to access BioMart from Python (Python -> rpy2 -> R's biomaRt
-> pandas.DataFrame), originally written to get a lookup table of gene IDs
-> various attributes for downstream work...
Install from PyPI::
$ pip install biomartpy
Or from github::
$ git clone [email protected]:daler/biomartpy.git
$ cd biomartpy
$ python setup.py develop
Choose a mart (use list_marts() to decide)::
>>> mart_name = 'ensembl'
Choose a dataset (use list_datasets(mart_name) to decide)::
>>> dataset = 'dmelanogaster_gene_ensembl'
Choose some attributes (use list_attributes(mart_name, dataset) to decide)::
>>> attributes = ['flybase_gene_id', 'flybasename_gene', 'description']
Get a pandas.DataFrame as a lookup table, indexed by the first attribute in
the provided list::
>>> df = make_lookup(mart_name, dataset, attributes=attributes)
.ix to extract rows::
>>> df.ix['FBgn0031209']
flybasename_gene Ir21a
description Ionotropic receptor 21a [Source:FlyBase gene n...
Name: FBgn0031209
>>> df.ix['FBgn0031209']['flybasename_gene']
'Ir21a'
When providing filters and values, you can either provide them in the way R expects (filters is a list, values is a list-of-lists with one list for each filter) or as a more convenient dictionary (here, only geting these IDs, and only for chromosome 2L)::
>>> filters = {
... 'flybase_gene_id': ['FBgn0031208', 'FBgn0002121', 'FBgn0031209', 'FBgn0051973'],
... 'chromosome_name': ['2L']}
Set up attributes (here, including chromosome_name to make sure results are
correct, but attributes and filters don't have to necessarily match)::
>>> attributes = ['flybase_gene_id', 'flybasename_gene', 'chromosome_name']
Get data::
>>> df = make_lookup(
... mart_name=mart_name,
... dataset=dataset,
... attributes=attributes,
... filters=filters)
Check results::
>>> df
flybasename_gene chromosome_name
flybase_gene_id
FBgn0002121 l(2)gl 2L
FBgn0031208 CG11023 2L
FBgn0031209 Ir21a 2L
FBgn0051973 Cda5 2L