hledger icon indicating copy to clipboard operation
hledger copied to clipboard

Add validations to prevent importing descriptions with invalid characters (such as `;`)

Open jfly opened this issue 5 months ago • 4 comments

As per the discussion on https://github.com/simonmichael/hledger/issues/1871, semicolons are not allowed in descriptions.

However, hledger currently does strange things when you give it data with descriptions with these invalid characters. For example, here's a CSV with a semicolon in a description:

$ cat in.csv
2020-01-01,"a description; with a semicolon","this is a comment",50
$ cat in.csv.rules
fields date,description,comment,amount
decimal-mark ,

If I parse this with hledger, I get the following journal:

$ hledger -f in.csv print
2020-01-01 a description; with a semicolon  ; this is a comment
    expenses:unknown              50
    income:unknown               -50

I'd rather get an error here about an invalid description.

The hledger add TUI has a similar quirk. When prompted for a description, I am allowed to enter a string with a semicolon, and it does not error out at all, and instead produces a journal that represents something different.

jfly avatar Jun 24 '25 17:06 jfly

Hmm, I see. Wouldn’t an error be less convenient than the current behaviour ? I guess semicolon shows up in csv descriptions sometimes. Currently that part of the destination will become the start of a comment instead. Is that often problematic ? I guess another option would be to replace semicolons there with comma or underscore.

simonmichael avatar Jun 24 '25 17:06 simonmichael

Wouldn’t an error be less convenient than the current behaviour?

IMO, no. I was importing CSVs from my bank with ";" chars in the field I was using as the description, and had to investigate why my imports were losing pieces of their descriptions. To me, it's better to fail-fast here than to do something surprising.

jfly avatar Jun 24 '25 17:06 jfly

I hear that @jfly.

Perhaps better than the current behaviour would be for the CSV reader to replace semicolons within text fields with underscore, or nothing. And perhaps print a warning if it did that.

If it gave an error, what would be the workaround in that case ? The user could preprocess the csv, replacing semicolons in descriptions with _. This sounds not so easy to do robustly. (Consider semicolon-separated values, eg.)

simonmichael avatar Oct 02 '25 02:10 simonmichael

Perhaps better than the current behaviour would be for the CSV reader to replace semicolons within text fields with underscore, or nothing.

It seems reasonable for hledger to have code for this, but IMO not the default behavior. I'd prefer that the default behavior be to error out (as the user is asking hledger to do something it simply cannot do), and for the error message to suggest the user either pre-process the CSV or to enable hledger's "replace semicolons with " behavior.

jfly avatar Oct 06 '25 22:10 jfly