grocktx
grocktx copied to clipboard
Parsing bank/credit card transactions, and scraping transactions from aggregators.
GrockTX
GrockTX is a library for parsing and scraping bank transaction data. Its primary function is to obtain and parse the memo strings that appear on bank and credit card statements such as::
11-14-09 HAMILTON TRUE VALUE HD DORCHESTER MA auth# 86673
GrockTX provides two main libraries: grocktx.parser and
grocktx.scraper.
-
grocktx.parserparses transaction memo strings. It picks the transaction apart to identify the channel (e.g. point-of-sale, check, transfer, etc.), the city, state, phone number and zip (if available), and the descriptive portion (e.g. "HAMILTON TRUE VALUE HD"). -
grocktx.scraperscrapes major bank account aggregators (currently mint.com and wesabe.com) in order to get personal transaction data, and parses it withgrocktx.parser.
The libraries are especially useful when combined with the GrockTX tagging server at https://grocktx.media.mit.edu, which provides user-supplied metadata about the transactions to understand better what they represent. Documentation of the API for this can be found at https://grocktx.media.mit.edu/api
grocktx.scraper depends on pycurl and BeautifulSoup. grocktx.parser
has no external dependencies beyond python 2.5.
Installation
Install using python setup.py install. You may use with pip with the
following requirements entry::
-e git://github.com/yourcelf/grocktx.git#egg=grocktx
grocktx.parser
``grocktx.parser`` defines one public method::
parse(memo, approx_date=None)
``memo`` is a memo string, such as appears on bank and credit card statements.
``approx_date`` is an optional parameter to use to interpret dates on memo
strings that lack year indicators, such as this one::
WITHDRAW# - POS 1128 1756 531470 HARVEST COOP CAMBRIDGE MA
If ``approx_date`` is not provided, the year which makes the date closest to
now will be used.
Example::
.. code-block:: python
>>> from grocktx.parser import parse
>>> parse("11-14-09 HAMILTON TRUE VALUE HD DORCHESTER MA auth# 86673")
{
'channel': 'pos',
'channel_details': {
'auth': "86673",
'auth_date': "2009-11-14"
},
'vendor': {
'description': "HAMILTON TRUE VALUE HD",
'city': "DORCHESTER",
'state': "MA",
'zip': "",
'phone': ""
}
}
For more examples of the supported transaction formats and their return
results, see ``grocktx/tests.py``.
grocktx.scraper
grocktx.scraper defines one public method::
``get_transactions(provider, username, password)``
provider should be one of 'mint' or 'wesabe', and username and
password should be the username and password for the chosen provider.
Example::
.. code-block:: python
>>> from grocktx.scraper import get_transactions
>>> get_transactions("wesabe", "myusername", "mypassword")
This method returns a JSON-serializable array of dicts containing the transaction data. Each transaction is returned in the following form::
[
{
'unique_id': alphanumeric string that is unique for this
transaction. May not be identical across providers,
but will be identical within a provider.
'channel': string,
'channel_details': {
'check_number': present if channel is a check
'auth': present if channel is POS or ATM
'auth_date': YYYY-MM-DD, present if available
'auth_time': HH:MM, present if available
},
'data_source': string, one of "wesabe", "mint", ...,
'amount': float, debits negative, credits positive
'date': YYYY-MM-DD,
'vendor': {
'description': parsed remainder of memo string, or none (as
in a check)
'city': city if available, or none
'state': state if available, or none
'zip': zip if available, or none
'phone': phone if available, or none
},
'raw': {
// raw fields from provider (e.g. wesabe, mint), values vary by
// provider. Currently:
// mint.com:
date: YYYY-MM-DD string
description: string
original_description: string
amount: float
transaction_type: string
category: string
account_name: string
labels: string
notes: string
// wesabe.com
guid: string
account_id: integer
date: YYYY-MM-DD string
original_date: YYYY-MM-DD string
amount: float
display_name: string
check_number: integer
raw_name: string
raw_txntype: string
memo: string
transfer_guid: string
merchant_id: integer
merchant_name: string
tags: string
}
},
...
]