beancount-import Help troubleshooting a custom GenericImporterSource

I have a custom payslips Beancount native importer that I'm trying to load through the GenericImporterSource, but it behaving oddly.

Starting from a journal that doesn't have any payslip transaction (but many from other sources, including OFX for my checking account), everything works fine and the correct checking account's transactions that are pending (with Expenses:FIXME) are correctly matched against the payslips.

The problem I have is after this initial match: when I restart the beancount-importer server, it shows all the same payslip transactions again, but this time there are no other pending checking account's transactions to match, which results in suggestions to create a "standalone" transaction.

I've been trying to fix this issue for a couple of months (on and off) already... I tried to debug the code and find the missing piece, but I'm having a hard time going through it, thus I'm asking for help.

What is possibly going wrong here? Is there any high level documentation on the matching process that I could take a look and try to figure it out on my own? Any help is appreciated.

May 15 '22 14:05 tavlima

@jbms, sorry for tagging you directly, but I'd appreciate some help with this. Could you please advise or perhaps just point me towards some hard-to-find documentation?

May 28 '22 13:05 tavlima

After importing some entries via generic importer, what is supposed to happen is that beancount-import recognizes the postings to the imported accounts as "cleared" by the presence of the source_desc metadata. Cleared postings cannot be matched to new imported transactions when determining match candidates. But there is a separate mechanism that should be excluding the source entries from being imported again, also based on the source_desc metadata. If that mechanism isn't working, they may appear in the invalid references list.

In summary I don't know exactly what is wrong but perhaps what I've described may help you investigate the issue more easily.

May 28 '22 18:05 jbms

The problem I have is after this initial match: when I restart the beancount-importer server

I'm also a bit puzzled by the expected behavior when adding to an existing transactions file. The section of code in get_pending_and_invalid_entries()

    for raw_entry in raw_entries:
        key = get_key_from_raw_entry(raw_entry)
        if matched_postings_counter[key] > 0:
            matched_postings_counter[key] -= 1
        else:
            results.add_pending_entry(make_import_result(raw_entry))

    for key, entry_posting_pairs in matched_postings.items():
        extra = matched_postings_counter[key]
        if extra:
            results.add_invalid_reference(
                InvalidSourceReference(extra, entry_posting_pairs))

Means that any keys in matched_postings_counter (from the entries already in transactions.beancount for the current account) that are not found in raw_entries (the newly imported entries) get marked as invalid. Wouldn't we expect that the majority of existing transactions won't matched whatever is being imported currently? For example, if I import my credit statement every month, all the existing transactions from May are not going to line up with what I'm currently importing for June

Jun 24 '23 23:06 addisonklinke

For reference, this is the function override I'm using to accomplish the goal of my workflow

def get_pending_and_invalid_entries(
        raw_entries: Iterable[RawEntry],
        journal_entries: Iterable[Directive],
        account_set: AbstractSet[str],
        get_key_from_posting: Callable[
            [Transaction, Posting, List[Posting], str, datetime.date], RawEntryKey],
        get_key_from_raw_entry: Callable[[RawEntry], RawEntryKey],
        make_import_result: Callable[[RawEntry], Transaction],
        results: SourceResults) -> None:
    """Don't expect all imported entries to be in the journal, just avoid re-importing duplicates

    See discussion here
    https://github.com/jbms/beancount-import/issues/168#issuecomment-1605771275
    """
    existing_account_postings: Set[RawEntryKey] = set()
    for entry in journal_entries:
        if not isinstance(entry, Transaction):
            continue
        for postings in group_postings_by_meta(entry.postings):
            posting = unbook_postings(postings)
            if posting.meta is None:
                continue
            if posting.account not in account_set:
                continue
            for source_desc, posting_date in get_posting_source_descs(posting):
                key = get_key_from_posting(entry, posting, postings,
                                           source_desc, posting_date)
                if key is None:
                    continue
                existing_account_postings.add(key)

    for raw_entry in raw_entries:
        key = get_key_from_raw_entry(raw_entry)
        if key not in existing_account_postings:
            results.add_pending_entry(make_import_result(raw_entry))
    results.add_accounts(account_set)

Jun 25 '23 01:06 addisonklinke

beancount-import beancount-import copied to clipboard

Help troubleshooting a custom GenericImporterSource

beancount-import
beancount-import copied to clipboard