beancount-import icon indicating copy to clipboard operation
beancount-import copied to clipboard

Following `beancount.ingest`/`beangulp` workflow

Open wangkev opened this issue 3 years ago • 4 comments

Hi --

Thanks for the awesome package!

I am looking to leverage beancount-import in the identify -> extract -> archive and generate -> test workflow from beancount.ingest/beangulp. It seems like with the new support of beancount importers, this is far more achievable.

Two specific questions:

  • Are there thoughts for how to best facilitate the workflow between beancount.ingest/beangulp, e.g. something simple like just replacing the extract step with beancount-import or something more dedicated built into beancount-import?
  • Is there a way to follow the same workflow with beancount-import sources? The beancount-import ofx source is the most full-featured I have seen. Seems a bit duplicative to rewrite it all as a beancount importer. Would be great to have a means to identify, generate (that generates a "default" to be tested against, like with beancount ofx importer), and test.

wangkev avatar May 15 '21 18:05 wangkev

Do I understand correctly that you would like to just use the beancount-import ofx source to non-interactively generate journal entries from ofx files?

And have it skip previously imported entries but not attempt to do any matching against manually-entered transactions or transactions from other sources, and not attempt to predict unknown accounts?

jbms avatar May 15 '21 19:05 jbms

I would like to follow the same vanilla beancount patterns: identify extract archive and generate test. The benefit of the pattern is that there is determinism in the process.

With beancount importers, one can keep the same exact workflow, but just replace the extract step with beancount-import. With beancount-import sources, beancount-import handles the identify part when one configures the data_sources, and the extract step is handled semi-automatically. I am not as clear about how to achieve the archive, generate, and test steps.

For archive, beangulp archives to the associated account. I don't think this mechanic exists in beancount-import, nor are the importers necessarily associated with a single account.

As you mentioned, generate and test could possibly be accomplished by generating and comparing against a non-interactive, deterministic output, even if the file is incomplete (e.g. the output of beangulp.examples.ofx. One input file + one importer = one output. And the input and output files can serve as the test cases as the importers change.

wangkev avatar May 16 '21 03:05 wangkev

I'm still not sure I understand what your objective is.

The identify step is not really in scope for beancount-import, and nothing in beancount-import currently handles that. My understanding is that the idea behind the identify step is that you will manually browse to your bank's website, and click on links to download files. All of these files will get dumped into the same download directory, and then you use the identify tool on that directory to avoid having to manually worry about the file names. For my own use I have instead relied on automating the download process (see https://github.com/jbms/finance-dl) so that files are all automatically downloaded with a suitable organization, and the identify step is not needed. Still, I can see that if you are downloading the files manually, the identify step may be useful. As beancount-import does not have any code to do that, though, you will have to either write it yourself or rely on something existing in beangulp.

beancount-import basically addresses just the extract step. I'm still not clear, though, on whether you want to use beancount-import normally through the web UI, or you want some entirely non-interactive import as with beangulp.

The archive step is also not applicable to beancount-import: beancount-import relies on metadata on the imported entries to determine what has already been imported and avoid creating duplicate entries. Therefore, there is no need to store the source files in a separate location after they are imported, and in fact beancount-import relies on them being present as normal inputs in order to detect invalid references to source data in transaction metadata.

As for generating test cases from your private data, there is no specific command-line tool for doing that, but the existing test framework essentially supports what you want. Take a look at ofx_test.py. You could very easily create test cases from additional data files. You can also potentially use ofx_sanitize.py to sanitize your source data so that you can create test cases that can be shared without leaking your private information.

jbms avatar May 17 '21 03:05 jbms

Ultimately, I would like to use beancount-import normally through the web UI. The question is more to do with how to marry the two similar, but also sufficiently different, workflows effectively.

Concretely, I'm thinking the workflow could look like this:

  • Download files (with arbitrary names), either manually or programmatically, to a central directory
  • identify to determine which files correspond with which importers
    • This step isn't as critical, but is helpful information and serves as a manual check
    • Could also just be something displayed in the UI maybe
  • extract via beancount-import UI
  • archive to move file from central directory to an archive directory with proper naming (beangulp extract uses account name and date, which generally makes sense, although I'm not exactly sure if there is an "account" analogue for beancount-import)
    • As mentioned, this does not exist in beancount-import, and should be relatively straightforward to replicate, but would essentially be like maintaining another (albeit lightweight) beangulp/beancount.ingest importer just for archiving
  • generate to generate a deterministic .beancount output (in beangulp case, this is basically just the extract output)
    • Admittedly I am not too familiar with beancount-import tests, but I think beancount-import is a superset beangulp/beancount.ingest's? Is there a way to generate the test cases programmatically?
    • End result is something like this?
      .
      └── archive
          └── Assets
              └── <path>
                  └── <to>
                      └── <account>
                          ├── 2021-05-18.statement.ofx
                          ├── 2021-05-18.statement.ofx.beancount
                          └── <other beancount-import outputs>
      
  • test to automatically compare all files in a given directory against theirgenerate output

I think for generic beangulp/beancount.ingest importers, I can follow the exact workflow using the vanilla CLI, just replacing the extract step with beancount-import using the web UI. That leaves the ofx beancount-import source. Although seems like there is probably a better solution, e.g. maybe implementing some archiving capability into beancount-import and building on the testing capabilities to automatically generate a deterministic test cases from a file and automatically testing against that?

wangkev avatar May 19 '21 02:05 wangkev