ledger-autosync icon indicating copy to clipboard operation
ledger-autosync copied to clipboard

Doesn't strip the BOM from paypal csv downloads

Open hollyhudson opened this issue 5 years ago • 3 comments

I'm seeing this in both the pip-downloaded version and when I install from a clone of the github repo. Here's what the error looks like:

geode❀ ~/Nextcloud/ledger/personal/unprocessed🐟
ledger-autosync \
		-a 'Assets:Liquid:PayPal' \
		-l main.ledger \
		/Users/me/Nextcloud/ledger/personal/unprocessed/bom-test.csv \
Traceback (most recent call last):
  File "/usr/local/bin/ledger-autosync", line 11, in <module>
    load_entry_point('ledger-autosync==1.0.1', 'console_scripts', 'ledger-autosync')()
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/cli.py", line 353, in run
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/cli.py", line 187, in import_csv
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/sync.py", line 203, in parse_file
  File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/converter.py", line 524, in make_converter
Exception: Cannot determine CSV type
geode❀ ~/Nextcloud/ledger/personal/unprocessed🐟

And here's what a friend added to my Makefile to fix it:

BOM := $(shell bash -c "echo -e '\xEF\xBB\xBF'")
$(INPUTDIR)/%-nobom.csv: $(INPUTDIR)/%.csv
    sed '1s/^$(BOM)//' < $< \
    | grep -v '"Order",".*",".*","","","0.00"' \
    > $@

(Sorry, that's a bit messy, the extra stuff is in there for completion, so there's context if anyone wants to duplicate it.) The sed statement is the important part -- it searches for the BOM on line one of the file and replaces it with nothing. INPUTDIR is the path to my downloaded, unprocessed files, and we're making a cleaned up paypal-nobom.csv file that is dependent on the downloaded paypal.csv file. The grep line is to deal with another problem where ledger-autosync would choke whenever it encountered one of the numerous lines with no value in the amount field (why, paypal??), so we've added that as part of the BOM-removal process.

hollyhudson avatar Jun 17 '19 13:06 hollyhudson

Thanks for this! It's supposed to strip the BOM, see https://github.com/egh/ledger-autosync/blob/master/ledgerautosync/sync.py#L186. Maybe the BOM is appearing somewhere besides the absolute beginning of the file, or perhaps my code is wrong? Can you send me a 2 line paypal.csv with the data anonymized so I can replicate? Thanks.

egh avatar Jun 18 '19 04:06 egh

For a one-off or alternative method, running dos2unix on the PayPal export suffices to remove the BOM.

colindean avatar Dec 11 '19 02:12 colindean

A bit late to the party, but this breaks on Python 3.10 because csv.DictReader expects f to be opened as text while the f.read(3) == codecs.BOM_UTF8 comparison only works if the file is opened as bytes. This means that the BOM is not stripped, while csv.reader (which is used in csv.DictReader) seems to read the f in bytes form (it's implemented in C, so haven't looked at the code of this), which then causes the BOM to be included in the fieldnames :confused:

torfjelde avatar Feb 25 '24 01:02 torfjelde