cannot print records with labels that contain Unicode characters
see for yourself
-
Create files as follows:
-
cassava-unicode.cabalcabal-version: 3.0 name: cassava-unicode version: 0.1.0.0 executable cassava-unicode main-is: Main.hs build-depends: base, cassava, bytestring hs-source-dirs: app default-language: Haskell2010 -
app/Main.hs{-# LANGUAGE DeriveGeneric #-} module Main where import GHC.Generics import qualified Data.Csv as Csv import qualified Data.ByteString.Lazy as Bytes data Ξ = Ξ {ξ :: Int} deriving (Generic) instance Csv.ToNamedRecord Ξ instance Csv.DefaultOrdered Ξ main :: IO () main = Bytes.putStr (Csv.encodeDefaultOrderedByName [Ξ {ξ = 0}])
-
-
Run
cabal runin command line.
what should happen
A single row of CSV is printed, preceded by a header.
what does happen
cassava-unicode: Uncaught exception ghc-internal:GHC.Internal.Exception.ErrorCall:
Data.Csv.Encoding.namedRecordToRecord: header contains name "\190" which is not present in the named record
HasCallStack backtrace:
error, called at src/Data/Csv/Encoding.hs:338:24 in cassava-0.5.4.1-a9e31c482b17af1df9a0e84111b5c8cff967d1a40cd48edc5a44579637a7df84:Data.Csv.Encoding
versions
- GHC 9.12.2
cassava0.5.4.1text2.1.2bytestring0.12.2.0
My investigation reveals the following:
-
toNamedRecordworks as it should, encoding theξcharacter into\206\190, which is the correct encoding as far as I can tell.Csv.toNamedRecord (Ξ {ξ = 0}) romList [("\206\190","0")] -
headerOrderis doing something wrong, encoding the header as\190.Csv.headerOrder (undefined ∷ Ξ) "\190"] -
A possible location of error is in this code:
nstance Selector s => GToNamedRecordHeader (M1 S s a) here gtoNamedRecordHeader opts m | null name = error "Cannot derive DefaultOrdered for constructors without selectors" | otherwise = [B8.pack (fieldLabelModifier opts (selName m))] where name = selName mIt is using
B8.pack, which does not encode Unicode characters correctly. Other functions in the same module useT.encodeUtf8 (T.pack …).
@andreasabel May I ask you to look into this? If, as I have come to think, the fix is one line change — replacing B8.pack with T.encodeUtf8, — it will be much faster for you to make the change and make a release than it would be for me to open a pull request, for you to review it, et cetera, et cetera.
Hi @kindaro I'm ramping up as the new maintainer of cassava. I'll make the change over the next two days but feel free to send a PR if you need this sooner.