beancount-import
beancount-import copied to clipboard
Following `beancount.ingest`/`beangulp` workflow
Hi --
Thanks for the awesome package!
I am looking to leverage beancount-import
in the identify
-> extract
-> archive
and generate
-> test
workflow from beancount.ingest
/beangulp
. It seems like with the new support of beancount importers, this is far more achievable.
Two specific questions:
- Are there thoughts for how to best facilitate the workflow between
beancount.ingest
/beangulp
, e.g. something simple like just replacing theextract
step withbeancount-import
or something more dedicated built intobeancount-import
? - Is there a way to follow the same workflow with
beancount-import
sources? Thebeancount-import
ofx
source is the most full-featured I have seen. Seems a bit duplicative to rewrite it all as a beancount importer. Would be great to have a means toidentify
,generate
(thatgenerates
a "default" to be tested against, like with beancount ofx importer), andtest
.
Do I understand correctly that you would like to just use the beancount-import ofx source to non-interactively generate journal entries from ofx files?
And have it skip previously imported entries but not attempt to do any matching against manually-entered transactions or transactions from other sources, and not attempt to predict unknown accounts?
I would like to follow the same vanilla beancount
patterns: identify
extract
archive
and generate
test
. The benefit of the pattern is that there is determinism in the process.
With beancount
importers, one can keep the same exact workflow, but just replace the extract
step with beancount-import
. With beancount-import
sources, beancount-import
handles the identify
part when one configures the data_sources
, and the extract
step is handled semi-automatically. I am not as clear about how to achieve the archive
, generate
, and test
steps.
For archive
, beangulp
archives to the associated account. I don't think this mechanic exists in beancount-import
, nor are the importers necessarily associated with a single account.
As you mentioned, generate
and test
could possibly be accomplished by generating and comparing against a non-interactive, deterministic output, even if the file is incomplete (e.g. the output of beangulp.examples.ofx
. One input file + one importer = one output. And the input and output files can serve as the test cases as the importers change.
I'm still not sure I understand what your objective is.
The identify
step is not really in scope for beancount-import, and nothing in beancount-import currently handles that. My understanding is that the idea behind the identify step is that you will manually browse to your bank's website, and click on links to download files. All of these files will get dumped into the same download directory, and then you use the identify tool on that directory to avoid having to manually worry about the file names. For my own use I have instead relied on automating the download process (see https://github.com/jbms/finance-dl) so that files are all automatically downloaded with a suitable organization, and the identify step is not needed. Still, I can see that if you are downloading the files manually, the identify step may be useful. As beancount-import does not have any code to do that, though, you will have to either write it yourself or rely on something existing in beangulp.
beancount-import basically addresses just the extract
step. I'm still not clear, though, on whether you want to use beancount-import normally through the web UI, or you want some entirely non-interactive import as with beangulp.
The archive
step is also not applicable to beancount-import: beancount-import relies on metadata on the imported entries to determine what has already been imported and avoid creating duplicate entries. Therefore, there is no need to store the source files in a separate location after they are imported, and in fact beancount-import relies on them being present as normal inputs in order to detect invalid references to source data in transaction metadata.
As for generating test cases from your private data, there is no specific command-line tool for doing that, but the existing test framework essentially supports what you want. Take a look at ofx_test.py. You could very easily create test cases from additional data files. You can also potentially use ofx_sanitize.py to sanitize your source data so that you can create test cases that can be shared without leaking your private information.
Ultimately, I would like to use beancount-import
normally through the web UI. The question is more to do with how to marry the two similar, but also sufficiently different, workflows effectively.
Concretely, I'm thinking the workflow could look like this:
- Download files (with arbitrary names), either manually or programmatically, to a central directory
-
identify
to determine which files correspond with which importers- This step isn't as critical, but is helpful information and serves as a manual check
- Could also just be something displayed in the UI maybe
-
extract
viabeancount-import
UI -
archive
to move file from central directory to an archive directory with proper naming (beangulp extract
uses account name and date, which generally makes sense, although I'm not exactly sure if there is an "account" analogue forbeancount-import
)- As mentioned, this does not exist in
beancount-import
, and should be relatively straightforward to replicate, but would essentially be like maintaining another (albeit lightweight)beangulp
/beancount.ingest
importer just for archiving
- As mentioned, this does not exist in
-
generate
to generate a deterministic.beancount
output (inbeangulp
case, this is basically just theextract
output)- Admittedly I am not too familiar with
beancount-import
tests, but I thinkbeancount-import
is a supersetbeangulp
/beancount.ingest
's? Is there a way to generate the test cases programmatically? - End result is something like this?
. └── archive └── Assets └── <path> └── <to> └── <account> ├── 2021-05-18.statement.ofx ├── 2021-05-18.statement.ofx.beancount └── <other beancount-import outputs>
- Admittedly I am not too familiar with
-
test
to automatically compare all files in a given directory against theirgenerate
output
I think for generic beangulp
/beancount.ingest
importers, I can follow the exact workflow using the vanilla CLI, just replacing the extract
step with beancount-import
using the web UI. That leaves the ofx beancount-import
source. Although seems like there is probably a better solution, e.g. maybe implementing some archiving capability into beancount-import
and building on the testing capabilities to automatically generate a deterministic test cases from a file and automatically testing against that?