hledger icon indicating copy to clipboard operation
hledger copied to clipboard

cannot import UNL file

Open PSLLSP opened this issue 3 years ago • 7 comments

UNL file is similar to CSV file, it uses pipe "|" as a field separator. UNL files are created from database dumps (INFORMIX). I prefer unl before csv as there is lower probability that character "|" is in user data.

This issue highlights several issues with CSV import...

I have found that when a file has extension unl then hledger tries to parse it as a journal, it ignores CSV import rules file. I tried to force import with --rules-file but it is ignored too. I have to rename file from unl to csv.

Example UNL file demo-sip.unl:

4002693275|2834339591|2014|9|4958.00|1|1|101026|2270.00|EUR|Gas|INO VS 10566007

Rules file: demo-sip.unl.rules

# skip the headings line:
#skip 1

separator |
date-format  %Y-%m-%d

# BUG, I cannot build date from several fields, workarround, I use fixed date :-(
#date %3-%4-15
date 2014-09-14

# BUG, date %3-%4-15 is not expanded to 2014-9-15; any way to have 2014-09-15? (9 -> 09)
description %11|%1, date %3-%4-15 %3 %4

amount %9 %10
$ hledger print -f demo-sip.unl
hledger: /home/user/test/demo-sip.unl:1:11:
  |
1 | 4002693275|2834339591|2014|9|4958.00|1|1|101026|2270.00|EUR|Gas|INO VS 10566007
  |           ^
unexpected '|'
expecting date separator or digit

I link unl files to csv files and run hledger again:

$ ln demo-sip.unl demo-sip.csv
$ ln demo-sip.unl.rules demo-sip.csv.rules
$ hledger print -f demo-sip.csv
2014-09-14 Gas|4002693275, date %3-%4-15 2014 9
    expenses:unknown     2270.00 EUR
    income:unknown      -2270.00 EUR

PSLLSP avatar Dec 08 '21 17:12 PSLLSP

Here's how the format of input files is detected: https://hledger.org/hledger.html#data-formats

UNL doesn't seem a very common format, can you point to a website or reference doc for it ?

simonmichael avatar Dec 08 '21 19:12 simonmichael

In other words, I think you just need to add the csv: prefix to activate the CSV (Character-separated values) reader. As you found out, it is flexible enough to read UNL.

$ hledger print -f csv:demo-sip.unl
2014-09-14 Gas|4002693275, date %3-%4-15 2014 9
    expenses:unknown     2270.00 EUR
    income:unknown      -2270.00 EUR

simonmichael avatar Dec 08 '21 19:12 simonmichael

For the date constructing issues: hyphen - seems to disrupt parsing of those interpolated fields. Perhaps that should be fixed. As a workaround, I found that the alternate date separators . or / worked better, eg:

separator |
date %3/%4/15
description %11 | %1
amount %9 %10

hledger converts them to the canonical YYYY-MM-DD ISO format:

$ hledger print -f csv:demo-sip.unl
2014-09-15 Gas | 4002693275
    expenses:unknown     2270.00 EUR
    income:unknown      -2270.00 EUR

simonmichael avatar Dec 08 '21 19:12 simonmichael

hyphen - seems to disrupt parsing of those interpolated fields

And the reason is that we allow hyphen in field names, like fields foo, bar_baz, qux-quux. Arguably that is unnecessary. Changing it now might break some existing rules files.

simonmichael avatar Dec 08 '21 19:12 simonmichael

What about extension, similar to bash. Variable can be in the format %qux-quux or %{qux-quux}. That means the date can be defined like date %{3}-%{4}-15. This syntax will open new possibilities to join strings, like %{id}EU. BTW, %3 is not variable name, it starts with a digit. In that case, parser could be smart and stop at '-'... Just a hint.

Other note is on UNL format. It is simple, really simple. Default separator is "|" but it can be changed to other character. Escape character is "\". "\" in data is written as "\\". unl is simpler than csv, no double quotes, etc. It comes from INFORMIX word, from UNLOAD command that dumps data from DB to a file.

https://www.ibm.com/docs/en/informix-servers/12.10?topic=statement-unload-file

I do not think that hledger should support unl format. On the other side, it is easy to write and read (and that is why I like it, I like KISS).

PSLLSP avatar Dec 08 '21 23:12 PSLLSP

I agree that we could add support for %{3}. (I tried this syntax, just in case.) I feel we support either this or %(3), somewhere else in hledger but I don't remember where.

As long as we support foo-bar and foo-3 as field names, we probably need to also support 3-foo, so I think it wouldn't be worth making %NUM- a special case.

Thanks for the link. I think we support UNL format now, with our Character Separated Value reader (though I'm not sure about the escaping). I don't think it's well-known enough to be worth adding .unl and unl: to our recognised file extensions/prefixes (csv/tsv/ssv), but enough demand could change my mind.

simonmichael avatar Dec 09 '21 01:12 simonmichael

Can this be closed ?

simonmichael avatar Jan 26 '22 18:01 simonmichael

No bug here.

simonmichael avatar Jul 14 '23 06:07 simonmichael