hledger icon indicating copy to clipboard operation
hledger copied to clipboard

wish/question: do not sort csv-imported entries

Open ShrykeWindgrace opened this issue 3 years ago • 3 comments

Hi!

I have a real-world use-case where sorting entries generated by a csv import breaks balance assertions - because entries and assertions are given in the chronological order as treated by the bank, not in the chronological order of client's actions.

A minimal example is here: https://github.com/ShrykeWindgrace/hledger_csv_import_order

Is there an elegant way to bypass this sorting? I see these solutions:

  • hlender import/hledger print csv lines one by one via a custom script; error-prone, but doable
  • sort entries manually; tedious
  • drop balance assertions; I am reluctant to do so
  • use completion date as transaction date; looks like going against the source of truth
  • some import option that I missed

It is also possible to shoehorn-in a switch hledger import --do-not-sort-csv-entries, but I do not have a clear idea on how this could impact entry deduplication.

ShrykeWindgrace avatar Jun 02 '22 17:06 ShrykeWindgrace

One of my banks is crazy enough to do what yours does, plus then rearrange the entries for any given day in a seemingly random order. I ended up writing a preprocessor that orders csv lines such that amounts + balances neatly tie together :(

adept avatar Sep 16 '22 23:09 adept

Or in other words, the balance amounts don't correspond with the order of transactions in the CSV, right ?

I found the same problem in OpenCollective's transactions CSV recently, and actually could not figure out their logic, I had to just skip balance assertions..

simonmichael avatar Sep 17 '22 02:09 simonmichael

To be clear: though I haven't looked at the example yet, I think the problem is when banks own data is wrongly ordered. hledger should check assertions according to the order of the CSV records (possibly influenced by newest-first. Ignore "hledger always sorts the generated transactions by date" there, that is referring to the print command's date sorting (Hledger.Reports.EntriesReport), which happens after reading and checking the CSV (Hledger.Read.Common.journalFinalise).)

simonmichael avatar Sep 17 '22 02:09 simonmichael

hledger should check assertions according to the order of the CSV records (possibly influenced by newest-first.

Could you elaborate a bit on this? Do you mean that this behavior needs to be implemented?

ShrykeWindgrace avatar Sep 20 '22 13:09 ShrykeWindgrace

No I mean it already works that way, and I believe the problem is with your bank csv, not hledger. If you don't think so, maybe you can minimise/anonymise and share it with us or with me.

simonmichael avatar Sep 20 '22 15:09 simonmichael

@simonmichael that's what the MWE in https://github.com/ShrykeWindgrace/hledger_csv_import_order tries to show.

The bank and the CSV orders the entries (and thus balance assignments) with respect to completion date. Upon import hledger sorts them with respect to start date - meaning that balance assignments no longer pass the validation.

For a file

START DATE,DESCRIPTION,AMOUNT,COMPLETION DATE,BALANCE
2022-01-01,First topup,100EUR,2022-01-01,100EUR
2022-01-02,Second topup,1000EUR,2022-01-02,1100EUR
2022-01-01,First expense,-10EUR,2022-01-02,1090EUR

with rules

skip 1
date-format %Y-%m-%d
separator ,

account1 Assets
account2 Unsorted

fields date, description, amount, date_ended, balance

comment1 date: %date_ended

the import hledger import data.csv -f test.journal produces a journal file

2022-01-01 First topup
    Assets            100EUR = 100EUR  ; date: 2022-01-01
    Unsorted         -100EUR

2022-01-01 First expense
    Assets            -10EUR = 1090EUR  ; date: 2022-01-02
    Unsorted           10EUR

2022-01-02 Second topup
    Assets           1000EUR = 1100EUR  ; date: 2022-01-02
    Unsorted        -1000EUR

which would not pass the checks done by, e.g. hledger balance.

If I add a newest-first hint in the rules file, the journal file becomes

; journal created 2022-09-21 by hledger

2022-01-01 First expense
    Assets            -10EUR = 1090EUR  ; date: 2022-01-02
    Unsorted           10EUR

2022-01-01 First topup
    Assets            100EUR = 100EUR  ; date: 2022-01-01
    Unsorted         -100EUR

2022-01-02 Second topup
    Assets           1000EUR = 1100EUR  ; date: 2022-01-02
    Unsorted        -1000EUR

which would not pass balance assertion checks either.

For what it's worth,

PS> hledger --version
hledger 1.27-ga0be6c2c8-20220901, windows-x86_64

ShrykeWindgrace avatar Sep 21 '22 12:09 ShrykeWindgrace

@ShrykeWindgrace thanks, I see what you mean.

Processing balance assertions/assignments in date order is a fairly core value of hledger's and a differentiator from Ledger. Of course we could implement processing them in parse order, but to me that seems fragile, not often needed and probably not worth the trouble.

So I don't have a much better idea than disabling balance assertions for this CSV, or at least the problematic ones where records are not date-ordered.

simonmichael avatar Jan 12 '23 08:01 simonmichael

Or transform that CSV like @adept did. Perhaps you could generate another date field that corresponds to the order of balance changes - that would seem a more accurate clearing date.

simonmichael avatar Jan 12 '23 17:01 simonmichael