hledger
                                
                                 hledger copied to clipboard
                                
                                    hledger copied to clipboard
                            
                            
                            
                        wish/question: do not sort csv-imported entries
Hi!
I have a real-world use-case where sorting entries generated by a csv import breaks balance assertions - because entries and assertions are given in the chronological order as treated by the bank, not in the chronological order of client's actions.
A minimal example is here: https://github.com/ShrykeWindgrace/hledger_csv_import_order
Is there an elegant way to bypass this sorting? I see these solutions:
- hlender import/- hledger printcsv lines one by one via a custom script; error-prone, but doable
- sort entries manually; tedious
- drop balance assertions; I am reluctant to do so
- use completion date as transaction date; looks like going against the source of truth
- some import option that I missed
It is also possible to shoehorn-in a switch hledger import --do-not-sort-csv-entries, but I do not have a clear idea on how this could impact entry deduplication.
One of my banks is crazy enough to do what yours does, plus then rearrange the entries for any given day in a seemingly random order. I ended up writing a preprocessor that orders csv lines such that amounts + balances neatly tie together :(
Or in other words, the balance amounts don't correspond with the order of transactions in the CSV, right ?
I found the same problem in OpenCollective's transactions CSV recently, and actually could not figure out their logic, I had to just skip balance assertions..
To be clear: though I haven't looked at the example yet, I think the problem is when banks own data is wrongly ordered. hledger should check assertions according to the order of the CSV records (possibly influenced by newest-first. Ignore "hledger always sorts the generated transactions by date" there, that is referring to the print command's date sorting (Hledger.Reports.EntriesReport), which happens after reading and checking the CSV (Hledger.Read.Common.journalFinalise).)
hledger should check assertions according to the order of the CSV records (possibly influenced by newest-first.
Could you elaborate a bit on this? Do you mean that this behavior needs to be implemented?
No I mean it already works that way, and I believe the problem is with your bank csv, not hledger. If you don't think so, maybe you can minimise/anonymise and share it with us or with me.
@simonmichael that's what the MWE in https://github.com/ShrykeWindgrace/hledger_csv_import_order tries to show.
The bank and the CSV orders the entries (and thus balance assignments) with respect to completion date. Upon import hledger sorts them with respect to start date - meaning that balance assignments no longer pass the validation.
For a file
START DATE,DESCRIPTION,AMOUNT,COMPLETION DATE,BALANCE
2022-01-01,First topup,100EUR,2022-01-01,100EUR
2022-01-02,Second topup,1000EUR,2022-01-02,1100EUR
2022-01-01,First expense,-10EUR,2022-01-02,1090EUR
with rules
skip 1
date-format %Y-%m-%d
separator ,
account1 Assets
account2 Unsorted
fields date, description, amount, date_ended, balance
comment1 date: %date_ended
the import hledger import data.csv -f test.journal produces a journal file
2022-01-01 First topup
    Assets            100EUR = 100EUR  ; date: 2022-01-01
    Unsorted         -100EUR
2022-01-01 First expense
    Assets            -10EUR = 1090EUR  ; date: 2022-01-02
    Unsorted           10EUR
2022-01-02 Second topup
    Assets           1000EUR = 1100EUR  ; date: 2022-01-02
    Unsorted        -1000EUR
which would not pass the checks done by, e.g. hledger balance.
If I add a newest-first hint in the rules file, the journal file becomes
; journal created 2022-09-21 by hledger
2022-01-01 First expense
    Assets            -10EUR = 1090EUR  ; date: 2022-01-02
    Unsorted           10EUR
2022-01-01 First topup
    Assets            100EUR = 100EUR  ; date: 2022-01-01
    Unsorted         -100EUR
2022-01-02 Second topup
    Assets           1000EUR = 1100EUR  ; date: 2022-01-02
    Unsorted        -1000EUR
which would not pass balance assertion checks either.
For what it's worth,
PS> hledger --version
hledger 1.27-ga0be6c2c8-20220901, windows-x86_64
@ShrykeWindgrace thanks, I see what you mean.
Processing balance assertions/assignments in date order is a fairly core value of hledger's and a differentiator from Ledger. Of course we could implement processing them in parse order, but to me that seems fragile, not often needed and probably not worth the trouble.
So I don't have a much better idea than disabling balance assertions for this CSV, or at least the problematic ones where records are not date-ordered.
Or transform that CSV like @adept did. Perhaps you could generate another date field that corresponds to the order of balance changes - that would seem a more accurate clearing date.