moneyguru icon indicating copy to clipboard operation
moneyguru copied to clipboard

Intelligent import autofill

Open hsoft opened this issue 11 years ago • 57 comments

When importing, it's always the same. There are some meaningless descriptions and they are always replaced by the same thing. There should be way to do that automatically.

First, for this feature to work, it must be possible to edit descriptions, payee and checkno in the import table. The moneyGuru document would maintain a list of all replacement that were made. For example, if, in the import table, I change "MEANINGLESS CRAP BLAH ** BLAH" to "Groceries", moneyGuru will remember it.

Then, there will be the "Smart Replace" (or some cleverer name) button in the import window that, as soon as there's one value that can be replaced, will be enabled. When clicked, a menu with all possible replacements will pop up (there will also be a "Perform all" item). When the user clicks on an item, a replacement occurs.

These replacements will be saved with the moneyGuru file.

hsoft avatar Jun 22 '13 16:06 hsoft

[not-tagged:"suggestion" tagged:"feature" bulk edit command]

hsoft avatar Jun 22 '13 16:06 hsoft

There's a problem with the way I planned this. Smart replace should also cover account transfers, which rules out the edit-in-import-table concept. I haven't thought this through yet, but the first idea to come to mind would be to remember values at import for each imported transaction, so that we can identify them later and make correct replacements.

hsoft avatar Jun 22 '13 16:06 hsoft

A request for this on GS

Import rules would be very nice. I import all using CSV files, and most of the transactions are recurrent (groceries at the same store once a week for example), it would be nice to be able to automatically set the "To" here. It would be also nice to be able to set "From" and "Payee". The rules could be based on the "description", "payee", "to" and "from" field.

hsoft avatar Jun 22 '13 16:06 hsoft

Another request on GS

I like your program (MoneyGuru), but I think it may be improved by automatic assignment of categories based on descriptions. I import all my data from from CSV, and have to assign categories manually, which is very long and tiring. Is it possible to assign a category automatically to the same string? For example, if I once assigned category "salary" to the description string with my company details, the program "remembers" it and automatically assigns this category every time I import data. A further step could be regular expressions defined by user to assign categories to multiple transactions. For example, if I shopped in different Tesco's, the strings will contain this name (Tesco) and they can easily be assigned a user-defined category ("shopping" or "food" or whatever).

hsoft avatar Jun 22 '13 16:06 hsoft

Smart replace should also cover account transfers ... the first idea to come to mind would be to remember values at import for each imported transaction, so that we can identify them later and make correct replacements.

The CSV lists from my bank actually show a unique transaction ID. I tried feeding that column both to "Tansaction ID" and to "Check #". However, moneyGuru does not recognize transfers from my checkings account to my savings account as one single transfer. I end up with two disjunct transfers: one is going from my checkings account and the other is coming into my savings account.

If this the feature you're contemplating, you could:

  • Add import support for a transaction identifier ("Tansaction ID" is there allready but what it is used for, actually?)
  • Simply check for unassigned transactions with same date and same amount on existing assets/liabilities during the import run, and (propose to) merge those transactions

I guess such a simple check does the trick for a personal finance app. Your typical user would import transfers for two or three bank accounts only. Identifying transactions between those boils down to matching up the unassigned, no?

Regexp replace for transfer targets would be just awesome. Ideally it would predict my success rate by telling me how many previously assigned transfers to my "groceries" account match the super fancy expression I am typing (and my failure rate, the matches with transfers assigned to other accounts).

Really like your work. Keep it up!

hsoft avatar Jun 22 '13 16:06 hsoft

The transaction ID is used for automatically "binding" transactions to existing ones during imports. This feature comes from OFX import where it's not possible to have the same transaction ID from two different accounts. I agree that in the case of CSV imports, it could also be used to connect transfer accounts. However, I'm afraid this new feature could bring all sorts of new complications. I'd have to think about it. Meanwhile, I created #233 for this feature.

hsoft avatar Jun 22 '13 16:06 hsoft

[bulk edit]

hsoft avatar Jun 22 '13 16:06 hsoft

Another similar piece of software I recently reviewed let's you set up or learn rules, and then apply them by selecting transactions and pressing the auto fill button.

This way it happens after the import step, so may avoid some of the difficulties.

Could be problematic if it's not easy to tell which transactions need auto filling (maybe up to user's own process or workflow, or a tag on transactions that's set once they've been autofilled, or a "last import" view like in iPhoto, etc for the most recently imported data)

andrew-hill avatar May 05 '14 16:05 andrew-hill

I'd like to take a stab at this ticket.

This and #166 are probably the most important tickets for my own use.

I do all my management once a month or I let it go for a couple of months and I'm curious about where I am, and then update if I have a need. Every time I start again, I have a small script that collects up my transactions (e-statements from my credit cards and banks), eliminates double transactions, renames things based on an occurrence file, etc. So general pre-processing. If it doesn't work out, I look at the problem, roll back, make a change to the pre-processing, and try again.

It would be good to be able to integrate this kind of work into moneyguru, or make the process "plugin-accessible". I'm going to read through, experiment a little, and post some ideas.

brownnrl avatar Nov 21 '14 21:11 brownnrl

Awesome! as always, don't hesitate to ask me if you need any guidance.

This is a long standing ticket which I always wanted to implement but I've always seen it as too big to fit my will (as it was with the amount drawing thing). I'm glad to see you're not afraid of big tickets :)

hsoft avatar Nov 21 '14 21:11 hsoft

I started to write a wiki page for the feature here. It's just started, and my initial thoughts are an outlook style set of rules.

If [Column Name] [is/is not] [equal to/contains/matches] [value/regular expression] Then [map to account name/other actions] [account name or other values].

Some of the easier rules will be added automatically when the user edits a meaningless name (MEANINGLESS CRAP BLAH ** BLAH to groceries) which would create the rule

If [Account Name] [is] [equal to] [MEANINGLESS CRAP BLAH ** BLAH] then [map to account name] [groceries].

The user could create a group or set of these rules, and save them much like csv layouts are saved. This would also be a very good place to provide a plugin interface that plugin authors could write there own custom logic for importing.

Again, I'll provide more of this in a formal way with the wiki page.

brownnrl avatar Nov 27 '14 15:11 brownnrl

It's a great writeup and questions that arise from your proposition are kind of similar to the discussion in #13 (which is much older than 2013. these dates are due to me having to move issues to github): Do we really want to include that much "UI complexity" into the app?

The thing is that, like with calculated transactions, implementing the actual matching rules is rather simple. The complex part comes at the UI level and that complexity comes down not only on the developer, but also the user. I don't know about you, but I find designing mail filters in a mail client is tiresome.

In #13, we tried to find if there wasn't some cleverer way of solving the problem (and then we never implemented anything :) ). I suspect that if the feature in this ticket is implemented with a good old matching rules editor, it will be less used than if it's implemented in some cleverer way.

In the wiki, we mention cases of variable values, such as GBF Market Foods UIXXXCDE ** 3 and GBF Market Foods UIAASVXS ** 2. Instead of spending time implementing a UI, couldn't we send time on developing a nice heuristics algorithm that properly matches those strings?

To prevent mis-autofill, maybe that autofilling procedures could involve some kind of confirmation dialog where we show replacements that are about to take place and then allow the user to cancel some of them. We could then "teach" the algo to work better.

Also, very early in moneyguru development, we were asking ourselves "what if moneyguru was a personal finance manager for developers?". I say that because we could make it so that default autofill rules could be enhanced by plugins. Code is much more expressive than select boxes in a row.

These are just ideas I throw around to fuel your thoughts.

hsoft avatar Nov 29 '14 21:11 hsoft

(now that I re-read stuff from #13, I realize that there's a lot of context missing and it's hard to make sense of it. it wasn't a very good idea to reference it here... nevermind that.)

hsoft avatar Nov 29 '14 21:11 hsoft

Yes, I'm not done with the wiki (I was just editing it as you appended this). It's basically my thoughts as I go along figuring out how things currently work, and finally get to the implementation.

My thought is that I'd get this document from a fluid state to a more defined one. My other thought that the first few iterations would be to find a target in the core GUI code that would be a good target for a well designed component that implements a defined interface that could be used for a plugin.

If we did implement an learning heuristic or an "e-mail rules" style UI interface, the back-end would use this component style with well defined interface, making plugins extensible. This would be very much like the Trac style of writing plugins. For instance, provide an IImportActionProvider interface, which has some methods for defining some abstraction of the swap_* functions in core.gui.import_window.ImportWindow. Maybe then a IImportConditionalProvider, able to provide a boolean value based on a transaction or account data.

I don't have something concrete in mind for interfaces yet. But whatever features we choose to implement, we'd do so first defining an abstract interface that plugin authors could come behind us and change our silly mistakes without having to learn all the cross-platform toolkit communication shtuff which is where I am buried in reading currently :).

I do really appreciate the feedback though. I don't expect you to read through the whole thing all the time, I'll let you know when it's finished and you can read it if you'd like. It's more just a sounding board for my own thoughts instead of diving into code changes up front.

brownnrl avatar Nov 29 '14 21:11 brownnrl

find a target in the core GUI code that would be a good target for a well designed component that implements a defined interface that could be used for a plugin.

+1 . Plugins came way too late in moneyguru's design, but if we can think about them when implementing new features, it can only be good. The relationship between UI layers and plugins is a tricky problem, however.

hsoft avatar Nov 29 '14 22:11 hsoft

If you have a chance to quickly glance over the above commit, I've taken the "Fix Import" logic and abstracted it into plugins. The base class for import actions is core.plugin.ImportActionPlugin, but the currently used plugins themselves reside in core.gui.import_window. Each plugin that doesn't have a NAME can be considered an abstract base class.

All functionality should still be retained, I've tested manually with some test import files. I have to fix all the tests (but not change their intention!), to confirm that for certain though.

This is just a first pass attempt at getting a plugin API going, but it isn't hard to see even someone with limited programming experience extend these individual actions by dropping in a quick file in the plugins directory.

from core.plugin import ImportActionPlugin

class MySwitcherAction(ImportActionPlugin):
    NAME = "My Switcher Plugin"
    ACTION_NAME = "Switch with Durdy!"

    def perform_action(self, selected_pane, panes, transactions):
    "If Hurdy in the description, replace with Durdy and throw him a quick 20."
        # or match against regex's whatever as comments indicate earlier
        for txn in transactions:
            txn.description = txn.description.replace("Hurdy", "Durdy")
            txn.amount += 20

More work to follow.

brownnrl avatar Dec 05 '14 14:12 brownnrl

Sweet. Very elegant start, congratulations!

The only problem I see is with plugin subclasses living in the core code, but as you already mention in this ticket, a plugin overhaul is needed to bring this ticket to completion (something I totally agree with). So this isn't really a problem because these plugins will be moved elsewhere before the end.

Side note: the example you just gave is not a very good one because transaction.amount += 20 can't work. With splits and all, you can't increase the amount of a txn like that.

History note: I once implemented a new transaction model ( #16 ) that would allow this code to work, but users didn't like it and I reverted the feature back to typical splits :(

hsoft avatar Dec 05 '14 20:12 hsoft

Yes, the plugin architecture. How do you want to handle recommendations about that? It goes to a higher level then this ticket. New ticket? Maybe wait a little until concrete suggestions build up?

I don't know about your point mixing plugin and core code. I think Trac's pretty vibrant plugin eco-system of plugins came from them implementing interfaces and then immediately using those interfaces in their core code. See, for example:

https://github.com/edgewall/trac/blob/trunk/trac/ticket/api.py

They declare a bunch of interfaces, and then use them in their components. The purpose of the interfaces is to define some publicly accessible methods for communicating with components. The purpose of using a Component class is to register plugins and track who is using which interfaces. It also gives developers a reference implementation for every interface, which they know is used code rather then an experimental or nifty-never-used feature because it's part of the core.

Aside from that, the plugins that were defined are refactored core code. Originally, I had moved the specific bits into the plugin_examples folder, but I was continually deleting my local cache of them so they would be copied and regenerated. It's just a lot easier to have it live alongside the place it's used. The only difference between what was there and now we elevated it into another class, and we're just using the plugin architecture to make sure it's extensible. Our use of it in the core also ensures its, well... usable!

On the transaction.amount assignment, I didn't think about it at the time... but do imports rows generate more than one split per line? If they don't, this should work, right? Not best practice though...

brownnrl avatar Dec 05 '14 20:12 brownnrl

Not all imported files can yield transactions with more than two splits, but QIF and OFX can.

As for the plugin thing, I don't have a strong opinion about this and I trust that the end result will be elegant. Whatever you prefer will be fine by me.

But yeah, we should create a new ticket for the plugin overhaul. I'll do that right away.

hsoft avatar Dec 05 '14 22:12 hsoft

Oh, I also wanted to comment about this section in your wiki page

moneyGuru already has some form of limited auto-matching builtin. You might want to check the current implementation. Right now, it only does auto-matching when importing OFX file that have reference IDs in it (which are typically guaranteed to be unique). Then there's this whole "binding" mechanism with the little "lock" icon in between and all. Have you seen this? It might influence your analysis.

(there are example OFX files with references in them in the test suite if you want to try it out)

hsoft avatar Dec 05 '14 22:12 hsoft

I hadn't read it at the time of writing the wiki entry, and my only experience had been with the CSV files at the time. I've since read the documentation on importing into a new account, and I will be experimenting with the feature and reviewing the code before making any additional assumptions or changes.

All tests now pass in the last commit. None of the tests had to be changed (which I guess is a testament to your high level testing).

So at this point, I'm going to do some feature review as you stated, and after I have a solid grasp try to determine how the user experience might change... do we make the import table edit-able, how would users see before/after changes, etc. Then go from there.

brownnrl avatar Dec 08 '14 03:12 brownnrl

Attached are my initial thoughts as to how I'm going to proceed with this ticket form here. I'll update the wiki before progressing, but the "import pipline" (image attached) will look something like this:

  1. From the loader, any import action plugin marked as "always apply" will be a pre-processing step just before the reaching the intelligent import autofill heuristic.
  2. The heuristic will do it's best to just change the existing import fields based on prior user edits, looking for string similarities, etc. (this needs to be formally thought out and written up)
  3. An extensible "bind transactions" plugin will take the result of that and apply user defined rules for binding transactions together.
  4. At this point the user is given the chance to provide feedback. If all is well, they complete the import. If not, they can perform steps 5 or 6.
  5. Apply any one of a number of selectable actions. Goto step 7.
  6. Unbind an existing transaction (as it is now), or hand edit the unbound import data. Goto step 7.
  7. The intelligent import autofill feature (plugin?) takes the changes and records them for future imports. Goto step 3.

So, in our bubble diagram, "Apply select action" (step 5) is already implemented (we still need to collect up the plugins from the plugin directory). "Always apply actions" (step 1) is also easily implemented. The only difference I might suggest is that these actions be able to be performed when selected by the user on single or multiple selected transactions as well as the entire pane or set of panes.

Under "Hand edit data", the import table will have to accept edits. The binding/unbinding via drag and drop is as moneyguru currently acts, but I was thinking of adding a third state of "questionable bind" where the lock would be followed by a question mark. This would indicate to the user that the automated binding rules were applied, and they should review the result of the application of these rules closely before continuing.

Finally, we have the intelligent import autofill, which will take a first crack based on the pre-processing steps and then allow the user to make their edits and record the changes. We'll want to provide a pane for configuration preferences (whatever they might be) or the ability to completely turn off the intelligent import autofill in case it acts outside of reason for some users. Again, it's behavior needs to be formally written up in the wiki proposal.

See the attached image.

import_pipeline

brownnrl avatar Dec 08 '14 22:12 brownnrl

I don't think I've ever thought out the import process as deeply, so I have questions rather than critiques.

  1. Is there a specific reason for the loop going back to the autofill/bind processes? I can't think of a situation where that would be needed. Do you have a scenario in mind?
  2. Which extensible parts will be needing the Store API we were talking about earlier?
  3. What do you mean by "overrides col user data"?

Looks good to me. I haven't looked at your latest commit yet, I'll do so as soon as I have free time.

hsoft avatar Dec 08 '14 23:12 hsoft

Keep in mind that "always apply actions", "bind transactions", and the extended "select actions" will for the majority of users be the same as current functionality or no-ops. That is where your custom code will go. This concept pretty much rules out the UI layer complexity representing rules, replaced by creating dead simple plugin code. So with that in mind...

Your first question... (1.)

We have the intelligent import autofill which changes transaction attributes, and binding rules that bind based on transaction attributes. So the transaction attributes need to be changed before the binding rules take effect. And if the transaction attributes were incorrectly changed, the binding rules will need another try at the corrected transaction.

Let's say I had an import of my credit card statement. I have a binding rule that says: amounts from checking to pay my credit card on the same date in the same amount should always bind to the similar one from my checking account. I only pay my cards from my checking account. My credit card statements represents these transfers as "Date Posted, Payment - Thank you, (transaction ID), -400.23". I don't see it bind right away because the transfer isn't correct. I change it from "Payment - Thank you" to "Checking". This edit needs to go back to the intelligent import autofill so it remembers that change for the future. The binding rules now have the correct account "Checking" for the transfer, and bind to the transaction from my bank checking account statement that already existed from a prior input.

Note that now that I read it, is this binding? To me, it means the other side of a transaction between two accounts as well as previously imported transactions from a file with an overlapping date range as the one currently being imported. I think that's correct.

Your second question... (2.)

So we want to provide this as an option for plugin developers. So I could see someone developing an "always apply" plugin that the plugin developer tells users to go to their data directory and edit the CSV file where the first column is original descriptions and the right is replacements. This might not work for every case, but it's the first example I could think of where I would want to have some kind of file. When I do the intelligent import autofill, I'd expect that I'd like to take a hard look at customizations available there as well. But it's for whatever data that the plugin developer wants to store between application runs.

Your third question... (3.) That's my sloppy writing... it's "overrides w/ user data" which is my shorthand for "overrides with user data".

I don't know if you need to review the commit, but feel free. It was just a clean up of the original commit and making the tests work. I did change the per-index name replacements to a total refresh of the names for all actions per your one comment. I plan on going through your reviews that are cosmetic or "good design or code style" and implement them together to minimize your time.

brownnrl avatar Dec 09 '14 01:12 brownnrl

So I was having a lot of issues figuring out how I could do things like change the transfer to a new account (that may not exist in the import accounts). However, if I made certain transactions change or added accounts I had to "refresh" the entry list somehow. I started looking at the account list in the loader, and then looking at how the Document class performed those changes.

However, I was worried about duplicating a lot of transaction/account modification code that the Document defines. So that brought me to the possibility of using the Document class to make those modifications. I couldn't use the existing Document, so instead I created a subclass called ImportDocument.

The ImportDocument has one instance for the ImportWindow, where all entries are imported into it (with no matches) and then matches are found against the existing Document. You can use this ImportDocument as a staging ground, perform actions, manually edit (eventually), and see the updates reflect after each action.

As an example, when you use a swap-date field currently, the entry order does not change. With the use of the ImportDocument the transactions now reflect the correct order.

All tests pass, and manual tests seem to yeild positive results importing both CSV and moneyguru account entries. If you have a chance, check out the small changes to tests, and the new test added "test_switch_transfer_accounts" in import_window_test.py.

I'd like to know if this approach is one you would consider merging with enough polish one day before I continue, or if you could suggest a better direction. My next iteration will be making the "right half" import side of the import table editable, and creating extensible binding procedures.

brownnrl avatar Dec 12 '14 23:12 brownnrl

It's astounding that I've become a bottleneck in your progress! You're on a fast paced streak. I've looked at your code briefly yesterday and the Document subclassing thing is a very interesting twist, but it has a lot of implications. I'll have to think about it.

However, I'm more worried at #402 right now and I'll look at this issue first (after all, it's a data loss issue). I hope to make a 2.8.1 release tomorrow.

hsoft avatar Dec 13 '14 16:12 hsoft

It's not the best conditions to review your code because I see the solution (ImportDocument) to a problem you had during development, but I don't see the part where you solve that problem (for which I don't blame you, I'm guessing that your work directory is a big mess of trial and errors and that you commit only what you consider clean).

So from my point of view, we introduce this big change for no benefit. But now, let's set that aside and take for granted that this change has benefits. We can still ask ourselves: Is ImportDocument overkill for what it brings?

I'm inclined to say yes. Document does a lot of things that we don't need and it was never designed with subclassing in mind. We can begin to see the kind of problems we'll have by looking at the cook tweak (btw: why do you need cooking at all? This is for computing running totals, but we don't have them in the import window). The document has a notion of current date range (and everything Document does is conditioned by it) and imports don't. This is one of the things that I'm sure will come and bite us down the road.

I write this, but I also don't want to be the one who makes your job harder. You're already knee deep in complexity. I'd say that if it solves your immediate problem, go ahead and we'll refactor it away before merging.

My gut tells me that if we make a core class "bubble up" to ImportWindow, it will be more something like an enhanced TransactionList.

I have trouble seeing how entry list refreshing is a problem, so I'll wait until I have further details, but I can maybe help you about "figuring out how I could do things like change the transfer to a new account":

In the import window, all Account instances are dummy ones. When it's import time, (in Document.import_entries()), only their name is important: For each account, we look if the name exists in the document, if not, we create a new account.

Now, it's true, with the ability to edit import entries manually comes the ability to spawn new dummy accounts. For that, I would recommend maintaining a parallel AccountList in ImportWindow. It's a rather self-sufficient class that should help you translate user edit operations to Account instances (whether it's a reused instance or a new one).

Do I make sense at all? I'd like to help you, but I can't see your problematic code.

hsoft avatar Dec 14 '14 20:12 hsoft

Oh damn, here I am, misleading you. cook is necessary to bind entries to accounts! But the Oven does many other things like scheduling and budget spawns. I wouldn't re-use it for imports, too tricky. You're better off copy/pasting Oven._cook_splits() (or extracting it away).

Tricky, tricky stuff.

hsoft avatar Dec 14 '14 20:12 hsoft

Yes, so I am viewing the entire branch as a learning process. I get to a point where I say, "Ok, I need to refresh my list of entries. Hmm, how do I do that?" So I go into the model, and see that they are generated by the oven, and it leads me down the rabbit hole.

But I do feel that I am learning. Ideally, like you say, I'd like to implement or reuse the loader's TransactionList, extract some logic from the Oven or the Document class which would enable a clean solution, but I didn't (and for the most part, still don't) know enough about how all the classes communicate with each other to do that extraction / refactoring. However, I'm getting results and all the tests are still passing.

I'm still working on it, and I think I'll have a rough draft of something that completely works. I'll probably post a video of the "end result" so we can develop a target. Even if we throw it away (where the working solution introduces too much technical debt), I've learned a lot about how the application works and I'm willing to start over keeping the good ideas and throwing the bad ones away. Bubbling up as you put it.

Right now, I'm focused on importing entries from two overlapping CSV files, and having the extensible binding plugin recognize the overlapping import, bind those transactions, and turn off the import for the overlapping ones. The current functionality of reference binding which allows the current battery of tests to pass is mimic'd with ReferenceBind class here. The desired functionality has a starting implementation in EquivalenceBind here.

brownnrl avatar Dec 14 '14 21:12 brownnrl

Also, in reference to your question about why ImportDocument is needed to change account transfers, I have a test defined in import_window_test.py which checks for the phrase "automatic" and then changes it to an account labeled "checking".

        def perform_action(self, selected_pane, panes, transactions):
            import_doc = selected_pane.import_document
            auto_pays = []

            # We go through all of our transactions searching for the keyword
            # automatic.
            for txn in transactions:
                for split in txn.splits:
                    if 'automatic' in split.account_name.lower():
                        # We collect up the actual account names marked for transfer
                        auto_pays.append(split.account_name)

            if not auto_pays:
                return

            checking = import_doc.accounts.find('checking')

            if checking is None:
                # Create a checking account if no checking account is present
                # in our current list of transactions
                checking = import_doc.new_account(AccountType.Asset, None)  # No group
                import_doc.accounts.set_account_name(checking, 'checking')

            # Reassign our transfers to checking
            auto_pay_accounts = [import_doc.accounts.find(ap) for ap in auto_pays]
            import_doc.delete_accounts(auto_pay_accounts, checking)

And this is confirmed to work in the test where the first two transactions have 'automatic' in them, and the last does not:

@with_app(app_with_custom_import_plugin)
def test_switch_transfer_accounts(app):
    # same as with dates: don't switch twice
    app.iwin.swap_type_list.select(SwapType.DescriptionPayee)
    app.iwin.perform_swap(apply_to_all=True)
    eq_(app.itable[0].transfer_import, 'checking')
    eq_(app.itable[1].transfer_import, 'checking')
    eq_(app.itable[2].transfer_import, 'GAS')

In essence, this is our intent in this ticket. I thought the easiest way to make that kind of change would be to have an ImportDocument object to do these kinds of manipulations.

brownnrl avatar Dec 15 '14 02:12 brownnrl