smart_importer
smart_importer copied to clipboard
Allow custom getters for attributes
This is somewhat of a follow up on https://github.com/beancount/smart_importer/issues/45
To avoid proliferation of metadata fields, I would like to keep my training metadata to a single row, so instead of having original_narration
, original_payee
and category
, I would like to have something like:
__train__: "Amazon,Point Of Sale Withdrawal Amazon web service aws.amazon.coWAUS TLR:M03 / DRAWER:803,Shops;Digital Purchase"
And then would like to define a pipeline attribute getter that parses this combined meta field, for example:
from smart_import.pipelines import Getter, StringVectorizer
from sklearn.pipeline import make_pipeline
class MyGetter(Getter):
def __init__(self, idx, delim=','):
self.delim = delim
self.idx = idx
def _getter(self, txn):
return txn.meta["__train__"].split(self.delim)[self.idx]
def MyPipeline(idx, delim=','):
return make_pipeline(MyGetter(idx, delim), StringVectorizer())
class PredictPayees(EntryPredictor):
"""Predicts payees."""
attribute = "payee"
pipeline_getters = {"payee": MyPipeline(0) , "narration": MyPipeline(1), "category": MyPipeline(2)}
weights = {"narration": 0.8, "payee": 0.5, "category": 0.5, "date.day": 0.1}
I don't think this results in much code duplication and the only internal change that would be necessary is to add the pipeline_getters
attribute to EntryPredictor
and change define_pipeline
method of EntryPredictor
, i.e. this line:
transformers.append((attribute, get_pipeline(attribute)))
Becomes:
pipeline = self.pipeline_getters.get(attribute, get_pipeline(attribute))
transformers.append((attribute, pipeline))
This tremendously increases the flexibility of feature extraction (as you can define custom logic based on multiple fields) and the only real internal change is to introduce this pipeline_getters
attribute.
@tarioch and @yagebu, what do you think? (you've been involved in the past pipeline refactorings)
my opinion: thumbs up, custom getters sound reasonable, why not. Pull Request welcome, thx!
long time no hear... shall we close this issue?