Add validations to prevent importing descriptions with invalid characters (such as `;`)
As per the discussion on https://github.com/simonmichael/hledger/issues/1871, semicolons are not allowed in descriptions.
However, hledger currently does strange things when you give it data with descriptions with these invalid characters. For example, here's a CSV with a semicolon in a description:
$ cat in.csv
2020-01-01,"a description; with a semicolon","this is a comment",50
$ cat in.csv.rules
fields date,description,comment,amount
decimal-mark ,
If I parse this with hledger, I get the following journal:
$ hledger -f in.csv print
2020-01-01 a description; with a semicolon ; this is a comment
expenses:unknown 50
income:unknown -50
I'd rather get an error here about an invalid description.
The hledger add TUI has a similar quirk. When prompted for a description, I am allowed to enter a string with a semicolon, and it does not error out at all, and instead produces a journal that represents something different.
Hmm, I see. Wouldn’t an error be less convenient than the current behaviour ? I guess semicolon shows up in csv descriptions sometimes. Currently that part of the destination will become the start of a comment instead. Is that often problematic ? I guess another option would be to replace semicolons there with comma or underscore.
Wouldn’t an error be less convenient than the current behaviour?
IMO, no. I was importing CSVs from my bank with ";" chars in the field I was using as the description, and had to investigate why my imports were losing pieces of their descriptions. To me, it's better to fail-fast here than to do something surprising.
I hear that @jfly.
Perhaps better than the current behaviour would be for the CSV reader to replace semicolons within text fields with underscore, or nothing. And perhaps print a warning if it did that.
If it gave an error, what would be the workaround in that case ? The user could preprocess the csv, replacing semicolons in descriptions with _. This sounds not so easy to do robustly. (Consider semicolon-separated values, eg.)
Perhaps better than the current behaviour would be for the CSV reader to replace semicolons within text fields with underscore, or nothing.
It seems reasonable for hledger to have code for this, but IMO not the default behavior. I'd prefer that the default behavior be to error out (as the user is asking hledger to do something it simply cannot do), and for the error message to suggest the user either pre-process the CSV or to enable hledger's "replace semicolons with