hledger
hledger copied to clipboard
cannot import UNL file
UNL file is similar to CSV file, it uses pipe "|" as a field separator. UNL files are created from database dumps (INFORMIX). I prefer unl before csv as there is lower probability that character "|" is in user data.
This issue highlights several issues with CSV import...
I have found that when a file has extension unl then hledger tries to parse it as a journal, it ignores CSV import rules file. I tried to force import with --rules-file but it is ignored too. I have to rename file from unl to csv.
Example UNL file demo-sip.unl:
4002693275|2834339591|2014|9|4958.00|1|1|101026|2270.00|EUR|Gas|INO VS 10566007
Rules file: demo-sip.unl.rules
# skip the headings line:
#skip 1
separator |
date-format %Y-%m-%d
# BUG, I cannot build date from several fields, workarround, I use fixed date :-(
#date %3-%4-15
date 2014-09-14
# BUG, date %3-%4-15 is not expanded to 2014-9-15; any way to have 2014-09-15? (9 -> 09)
description %11|%1, date %3-%4-15 %3 %4
amount %9 %10
$ hledger print -f demo-sip.unl
hledger: /home/user/test/demo-sip.unl:1:11:
|
1 | 4002693275|2834339591|2014|9|4958.00|1|1|101026|2270.00|EUR|Gas|INO VS 10566007
| ^
unexpected '|'
expecting date separator or digit
I link unl files to csv files and run hledger again:
$ ln demo-sip.unl demo-sip.csv
$ ln demo-sip.unl.rules demo-sip.csv.rules
$ hledger print -f demo-sip.csv
2014-09-14 Gas|4002693275, date %3-%4-15 2014 9
expenses:unknown 2270.00 EUR
income:unknown -2270.00 EUR
Here's how the format of input files is detected: https://hledger.org/hledger.html#data-formats
UNL doesn't seem a very common format, can you point to a website or reference doc for it ?
In other words, I think you just need to add the csv: prefix to activate the CSV (Character-separated values) reader. As you found out, it is flexible enough to read UNL.
$ hledger print -f csv:demo-sip.unl
2014-09-14 Gas|4002693275, date %3-%4-15 2014 9
expenses:unknown 2270.00 EUR
income:unknown -2270.00 EUR
For the date constructing issues: hyphen - seems to disrupt parsing of those interpolated fields. Perhaps that should be fixed. As a workaround, I found that the alternate date separators . or / worked better, eg:
separator |
date %3/%4/15
description %11 | %1
amount %9 %10
hledger converts them to the canonical YYYY-MM-DD ISO format:
$ hledger print -f csv:demo-sip.unl
2014-09-15 Gas | 4002693275
expenses:unknown 2270.00 EUR
income:unknown -2270.00 EUR
hyphen - seems to disrupt parsing of those interpolated fields
And the reason is that we allow hyphen in field names, like fields foo, bar_baz, qux-quux. Arguably that is unnecessary. Changing it now might break some existing rules files.
What about extension, similar to bash. Variable can be in the format %qux-quux or %{qux-quux}. That means the date can be defined like date %{3}-%{4}-15. This syntax will open new possibilities to join strings, like %{id}EU. BTW, %3 is not variable name, it starts with a digit. In that case, parser could be smart and stop at '-'... Just a hint.
Other note is on UNL format. It is simple, really simple. Default separator is "|" but it can be changed to other character. Escape character is "\". "\" in data is written as "\\". unl is simpler than csv, no double quotes, etc. It comes from INFORMIX word, from UNLOAD command that dumps data from DB to a file.
https://www.ibm.com/docs/en/informix-servers/12.10?topic=statement-unload-file
I do not think that hledger should support unl format. On the other side, it is easy to write and read (and that is why I like it, I like KISS).
I agree that we could add support for %{3}. (I tried this syntax, just in case.) I feel we support either this or %(3), somewhere else in hledger but I don't remember where.
As long as we support foo-bar and foo-3 as field names, we probably need to also support 3-foo, so I think it wouldn't be worth making %NUM- a special case.
Thanks for the link. I think we support UNL format now, with our Character Separated Value reader (though I'm not sure about the escaping). I don't think it's well-known enough to be worth adding .unl and unl: to our recognised file extensions/prefixes (csv/tsv/ssv), but enough demand could change my mind.
Can this be closed ?
No bug here.