hledger icon indicating copy to clipboard operation
hledger copied to clipboard

CSV import de-duplication

Open kiteloopdesign opened this issue 3 years ago • 2 comments

Hi, first of all let me say thanks for hledger. Not only for the tool itself but especially for all the documentation, very needed to us people not familiar with accounting (I arrive here from running away from ledger documentation obscurity)

Related doc So, I have been reading some documentation already and I fail to completely understand what is going on here. Namely my issue is related with this sections:

https://hledger.org/1.24/hledger.html#csv-format https://hledger.org/import-csv.html https://hledger.org/1.24/hledger.html#deduplicating-importing https://hledger.org/1.24/hledger.html#import

Related issues I also checked on opened/closed issues and I see some comments related (but not exactly the same) to this.

Tool version I am on Fedora, version 1.24 (AFAIK, there's nothing new reg. this on 1.25 version)

Issue description My lovely bank goes to great lengths in making quite inconvenient to download the transaction data. I need to log in, select a date range, receive a notification in my phone, accept it, download the excel, convert it to csv, etc... Sometimes the date range is not complete (capped by the bank), and I need to repeat all the process to select maybe a smaller range interval to download that data missing.

And so sometimes I may end with with repeated transactions across CSV files. When importing this data with hledger I end up with repeated transactions on my journal file

Example

Here the files used in this example:

# all.journal
2019-01-01 initial
    assets:bank:checking          $1000
    income:unknown               

# bank1.csv
Date, Description, Amount
12/11/2019, repeated, 1
12/11/2019, repeated, 1 
13/11/2019, unique-1, 2

# bank2.csv
Date, Description, Amount
12/11/2019, repeated, 1
14/11/2019, unique-2, 3

# bank.csv.rules
skip 1
currency $
date-format %-d/%-m/%Y
fields date, description, amount

What I am doing

$ hledger import bank1.csv --rules-file bank.csv.rules -f all.journal                                                                                                          
imported 3 new transactions from bank1.csv
$ hledger -f all.journal print                                                                                                                                                 
2019-01-01 initial
    assets:bank:checking           $1000
    income:unknown

2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1

2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1

2019-11-13 unique-1
    expenses:unknown              $2
    income:unknown               $-2

I think this is expected (explained on https://hledger.org/1.24/hledger.html#deduplication but not on https://hledger.org/1.24/hledger.html#deduplicating-importing)

$ hledger import bank1.csv --rules-file bank.csv.rules -f all.journal                                                                                                          
no new transactions found in bank1.csv

Ok, expected, so this is the de-duplicating working as described on the manual

$ hledger import bank2.csv --rules-file bank.csv.rules -f all.journal                                                                                                          
imported 2 new transactions from bank2.csv

$ hledger -f all.journal print                                                                                                                                                 
2019-01-01 initial
    assets:bank:checking           $1000
    income:unknown

2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1

2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1

2019-11-12 repeated
    expenses:unknown              $1
    income:unknown               $-1

2019-11-13 unique-1
    expenses:unknown              $2
    income:unknown               $-2

2019-11-14 unique-2
    expenses:unknown              $3
    income:unknown               $-3

This I would NOT expect to happen. I would expect hledger to remember past transactions and so drop this last repeated one

kiteloopdesign avatar Apr 26 '22 15:04 kiteloopdesign

I'm glad the docs are helping! I try to make them complete.

The problem here is just that your successive CSV files have different names. The .latest.* state file is per CSV file name, so you should reuse the same CSV file name each time. I do this renaming after download with a makefile, something like cp ~/Downloads/blah-blah-checking-*.csv ~/finance/mybank-checking.csv. Perhaps docs can be improved here.

simonmichael avatar Apr 26 '22 21:04 simonmichael

Thanks, that does the trick indeed

If you are to update documentation, may I suggest to connect somehow (maybe add a link b/w them, or even merge together) the following doc chapters? It was difficult to me to find the information, since it is scattered across a single page long manual

https://hledger.org/1.24/hledger.html#deduplication https://hledger.org/1.24/hledger.html#deduplicating-importing

kiteloopdesign avatar Apr 27 '22 07:04 kiteloopdesign

These sections link to each other now.

simonmichael avatar Jul 14 '23 07:07 simonmichael