ledger-autosync
ledger-autosync copied to clipboard
Doesn't strip the BOM from paypal csv downloads
I'm seeing this in both the pip-downloaded version and when I install from a clone of the github repo. Here's what the error looks like:
geode❀ ~/Nextcloud/ledger/personal/unprocessed🐟
ledger-autosync \
-a 'Assets:Liquid:PayPal' \
-l main.ledger \
/Users/me/Nextcloud/ledger/personal/unprocessed/bom-test.csv \
Traceback (most recent call last):
File "/usr/local/bin/ledger-autosync", line 11, in <module>
load_entry_point('ledger-autosync==1.0.1', 'console_scripts', 'ledger-autosync')()
File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/cli.py", line 353, in run
File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/cli.py", line 187, in import_csv
File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/sync.py", line 203, in parse_file
File "/usr/local/lib/python3.7/site-packages/ledger_autosync-1.0.1-py3.7.egg/ledgerautosync/converter.py", line 524, in make_converter
Exception: Cannot determine CSV type
geode❀ ~/Nextcloud/ledger/personal/unprocessed🐟
And here's what a friend added to my Makefile to fix it:
BOM := $(shell bash -c "echo -e '\xEF\xBB\xBF'")
$(INPUTDIR)/%-nobom.csv: $(INPUTDIR)/%.csv
sed '1s/^$(BOM)//' < $< \
| grep -v '"Order",".*",".*","","","0.00"' \
> $@
(Sorry, that's a bit messy, the extra stuff is in there for completion, so there's context if anyone wants to duplicate it.) The sed statement is the important part -- it searches for the BOM on line one of the file and replaces it with nothing. INPUTDIR is the path to my downloaded, unprocessed files, and we're making a cleaned up paypal-nobom.csv file that is dependent on the downloaded paypal.csv file. The grep line is to deal with another problem where ledger-autosync would choke whenever it encountered one of the numerous lines with no value in the amount field (why, paypal??), so we've added that as part of the BOM-removal process.
Thanks for this! It's supposed to strip the BOM, see https://github.com/egh/ledger-autosync/blob/master/ledgerautosync/sync.py#L186. Maybe the BOM is appearing somewhere besides the absolute beginning of the file, or perhaps my code is wrong? Can you send me a 2 line paypal.csv with the data anonymized so I can replicate? Thanks.
For a one-off or alternative method, running dos2unix
on the PayPal export suffices to remove the BOM.
A bit late to the party, but this breaks on Python 3.10 because csv.DictReader
expects f
to be opened as text while the f.read(3) == codecs.BOM_UTF8
comparison only works if the file is opened as bytes. This means that the BOM is not stripped, while csv.reader
(which is used in csv.DictReader
) seems to read the f
in bytes form (it's implemented in C, so haven't looked at the code of this), which then causes the BOM to be included in the fieldnames :confused: