hledger icon indicating copy to clipboard operation
hledger copied to clipboard

date parsing is not aware of non-english month names

Open simonmichael opened this issue 3 years ago • 0 comments

Reported by pablo1107 in #plaintextaccounting: a date-format in CSV rules did not perform as expected:

~/ledger master*
❯ export LC_TIME="es_AR.UTF-8"

~/ledger master*
❯ hledger print --rules-file vacaciones-tc.rules -f vacaciones-tc.csv
hledger: error: could not parse "30 abr 2022" as a date using date format "%d %h %Y"
record values: "30 abr 2022","MERCADOPAGO*QUILMESROCK          02/06 ","ARS  +383,33"
the date rule is:   %1
the date-format is: %d %h %Y
you may need to change your date rule, change your date-format rule, or change your skip rule
for m/d/y or d/m/y dates, use date-format %-m/%-d/%Y or date-format %-d/%-m/%Y

In fact hledger has no awareness of the system time locale / $LC_TIME; it hard codes the en_US time locale (as do essentially all Haskell programs, I would guess).

This makes it difficult to parse CSV containing non-english month/day names. Possibly it also manifests in other ways, though I don't know of any.

https://hackage.haskell.org/package/env-locale seems to be the way to get the system time locale. Eg:

#!/usr/bin/env stack
-- stack runghc --verbosity info --package time --package env-locale

import Data.Time
import Data.Time.Format
import System.Locale.Current

main = do
  ctl <- currentLocale 
  d <- parseTimeM False ctl "%b" "abr" :: IO Day
  print d
$ export LC_TIME=es_AR.UTF-8
$ ./a.hs
1970-04-01

This is a slight can of worms though, similar to the text encoding discussion at #1834 etc. Mainly, if this capability is desirable, and it seems so, what is the best default providing predictability, convenience for reading your local data, and convenience for reading foreign-language data. Probably we can follow however we decide to handle text encoding.

simonmichael avatar Jun 22 '22 17:06 simonmichael