hledger
                                
                                
                                
                                    hledger copied to clipboard
                            
                            
                            
                        CSV import de-duplication
Hi, first of all let me say thanks for hledger. Not only for the tool itself but especially for all the documentation, very needed to us people not familiar with accounting (I arrive here from running away from ledger documentation obscurity)
Related doc So, I have been reading some documentation already and I fail to completely understand what is going on here. Namely my issue is related with this sections:
https://hledger.org/1.24/hledger.html#csv-format https://hledger.org/import-csv.html https://hledger.org/1.24/hledger.html#deduplicating-importing https://hledger.org/1.24/hledger.html#import
Related issues I also checked on opened/closed issues and I see some comments related (but not exactly the same) to this.
Tool version I am on Fedora, version 1.24 (AFAIK, there's nothing new reg. this on 1.25 version)
Issue description My lovely bank goes to great lengths in making quite inconvenient to download the transaction data. I need to log in, select a date range, receive a notification in my phone, accept it, download the excel, convert it to csv, etc... Sometimes the date range is not complete (capped by the bank), and I need to repeat all the process to select maybe a smaller range interval to download that data missing.
And so sometimes I may end with with repeated transactions across CSV files. When importing this data with hledger I end up with repeated transactions on my journal file
Example
Here the files used in this example:
# all.journal
2019-01-01 initial
    assets:bank:checking          $1000
    income:unknown               
# bank1.csv
Date, Description, Amount
12/11/2019, repeated, 1
12/11/2019, repeated, 1 
13/11/2019, unique-1, 2
# bank2.csv
Date, Description, Amount
12/11/2019, repeated, 1
14/11/2019, unique-2, 3
# bank.csv.rules
skip 1
currency $
date-format %-d/%-m/%Y
fields date, description, amount
What I am doing
$ hledger import bank1.csv --rules-file bank.csv.rules -f all.journal                                                                                                          
imported 3 new transactions from bank1.csv
$ hledger -f all.journal print                                                                                                                                                 
2019-01-01 initial
    assets:bank:checking           $1000
    income:unknown
2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1
2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1
2019-11-13 unique-1
    expenses:unknown              $2
    income:unknown               $-2
I think this is expected (explained on https://hledger.org/1.24/hledger.html#deduplication but not on https://hledger.org/1.24/hledger.html#deduplicating-importing)
$ hledger import bank1.csv --rules-file bank.csv.rules -f all.journal                                                                                                          
no new transactions found in bank1.csv
Ok, expected, so this is the de-duplicating working as described on the manual
$ hledger import bank2.csv --rules-file bank.csv.rules -f all.journal                                                                                                          
imported 2 new transactions from bank2.csv
$ hledger -f all.journal print                                                                                                                                                 
2019-01-01 initial
    assets:bank:checking           $1000
    income:unknown
2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1
2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1
2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1
2019-11-13 unique-1
    expenses:unknown              $2
    income:unknown               $-2
2019-11-14 unique-2
    expenses:unknown              $3
    income:unknown               $-3
This I would NOT expect to happen. I would expect hledger to remember past transactions and so drop this last repeated one
I'm glad the docs are helping! I try to make them complete.
The problem here is just that your successive CSV files have different names. The .latest.* state file is per CSV file name, so you should reuse the same CSV file name each time. I do this renaming after download with a makefile, something like cp ~/Downloads/blah-blah-checking-*.csv ~/finance/mybank-checking.csv. Perhaps docs can be improved here.
Thanks, that does the trick indeed
If you are to update documentation, may I suggest to connect somehow (maybe add a link b/w them, or even merge together) the following doc chapters? It was difficult to me to find the information, since it is scattered across a single page long manual
https://hledger.org/1.24/hledger.html#deduplication https://hledger.org/1.24/hledger.html#deduplicating-importing
These sections link to each other now.