Frames icon indicating copy to clipboard operation
Frames copied to clipboard

Parsing numerical values which use comma rather than a point

Open idontgetoutmuch opened this issue 1 year ago • 6 comments

Some countries use 3,1459 rather than 3.1459. I have for example

"LATITUDE","LONGITUDE"
"-21,5245377777","-39,6610013888"
"-9,5933511111","-35,8891097222"

But if I run

{-# LANGUAGE DataKinds, FlexibleContexts, QuasiQuotes, TemplateHaskell, TypeApplications #-}

{-# OPTIONS_GHC -Wall -Wno-type-defaults #-}

import           Frames
import qualified Pipes.Prelude as P

tableTypes "Row" "mwe.csv"

tbl2a :: IO [ColFun Maybe Row]
tbl2a = runSafeT . P.toListM $ readTableMaybe "mwe.csv"

main :: IO ()
main = tbl2a >>= mapM_ (putStrLn . show)

Then Frames seems to parse these values as doubles:

main
{Just LATITUDE :-> -2.15245377777e11, Just LONGITUDE :-> -3.96610013888e11}
{Just LATITUDE :-> -9.5933511111e10, Just LONGITUDE :-> -3.58891097222e11}

idontgetoutmuch avatar Jul 03 '23 17:07 idontgetoutmuch

row2 :: Rec (Either Text :. ElField) '[LATITUDE, LONGITUDE]
row2 = readRec ["-21,5245377777","-39,6610013888"]

row2
{Right LATITUDE :-> -2.15245377777e11, Right LONGITUDE :-> -3.96610013888e11}

but

Prelude> read "3,1459" :: Double
*** Exception: Prelude.read: no parse

idontgetoutmuch avatar Jul 04 '23 08:07 idontgetoutmuch

instance Parseable Double where
  -- Some CSV's export Doubles in a format like '1,000.00', filtering
  -- out commas lets us parse those sucessfully
  parse = fmap Definitely . fromText . T.filter (/= ',')

idontgetoutmuch avatar Jul 04 '23 09:07 idontgetoutmuch

With

parse = fmap Definitely . fromText

I get

main
{LATITUDE :-> "-21,5245377777", LONGITUDE :-> "-39,6610013888"}
{LATITUDE :-> "-9,5933511111", LONGITUDE :-> "-35,8891097222"}

@acowley I am not sure what the answer is here. Presumably somebody wanted to parse "1,000.00" as a 1000.00 so we can't just delete the offending filter.

idontgetoutmuch avatar Jul 04 '23 09:07 idontgetoutmuch

Actually now that I think about it, Frames should not interpret "1,000.00". If the number is not in the basic format "1000.00" then the user will get text and can parse it themself just like I will parse the text "3,1459" myself.

idontgetoutmuch avatar Jul 04 '23 09:07 idontgetoutmuch

There's this https://en.m.wikipedia.org/wiki/Decimal_separator but then someone would have to do something other than just filter out commas.

idontgetoutmuch avatar Jul 04 '23 09:07 idontgetoutmuch

See https://github.com/acowley/Frames/issues/53

idontgetoutmuch avatar Jul 04 '23 09:07 idontgetoutmuch