cassava icon indicating copy to clipboard operation
cassava copied to clipboard

cannot print records with labels that contain Unicode characters

Open kindaro opened this issue 3 months ago • 2 comments

see for yourself

  1. Create files as follows:

    • cassava-unicode.cabal

      cabal-version:   3.0
      name:            cassava-unicode
      version:         0.1.0.0
      
      executable cassava-unicode
          main-is:          Main.hs
          build-depends:    base, cassava, bytestring
          hs-source-dirs:   app
          default-language: Haskell2010
      
    • app/Main.hs

      {-# LANGUAGE DeriveGeneric #-}
      
      module Main where
      
      import GHC.Generics
      import qualified Data.Csv as Csv
      import qualified Data.ByteString.Lazy as Bytes
      
      data Ξ = Ξ {ξ :: Int} deriving (Generic)
      
      instance Csv.ToNamedRecord Ξ
      instance Csv.DefaultOrdered Ξ
      
      main :: IO ()
      main = Bytes.putStr (Csv.encodeDefaultOrderedByName [Ξ {ξ = 0}])
      
  2. Run cabal run in command line.

what should happen

A single row of CSV is printed, preceded by a header.

what does happen

cassava-unicode: Uncaught exception ghc-internal:GHC.Internal.Exception.ErrorCall:

Data.Csv.Encoding.namedRecordToRecord: header contains name "\190" which is not present in the named record

HasCallStack backtrace:
  error, called at src/Data/Csv/Encoding.hs:338:24 in cassava-0.5.4.1-a9e31c482b17af1df9a0e84111b5c8cff967d1a40cd48edc5a44579637a7df84:Data.Csv.Encoding

versions

  • GHC 9.12.2
  • cassava 0.5.4.1
  • text 2.1.2
  • bytestring 0.12.2.0

kindaro avatar Sep 28 '25 06:09 kindaro

My investigation reveals the following:

  • toNamedRecord works as it should, encoding the ξ character into \206\190, which is the correct encoding as far as I can tell.

     Csv.toNamedRecord (Ξ {ξ = 0})
    romList [("\206\190","0")]
    
  • headerOrder is doing something wrong, encoding the header as \190.

     Csv.headerOrder (undefined ∷ Ξ)
    "\190"]
    
  • A possible location of error is in this code:

    nstance Selector s => GToNamedRecordHeader (M1 S s a)
    here
    gtoNamedRecordHeader opts m
    	| null name = error "Cannot derive DefaultOrdered for constructors without selectors"
    	| otherwise = [B8.pack (fieldLabelModifier opts (selName m))]
    where name = selName m
    

    It is using B8.pack, which does not encode Unicode characters correctly. Other functions in the same module use T.encodeUtf8 (T.pack …).

@andreasabel   May I ask you to look into this? If, as I have come to think, the fix is one line change — replacing B8.pack with T.encodeUtf8, — it will be much faster for you to make the change and make a release than it would be for me to open a pull request, for you to review it, et cetera, et cetera.

kindaro avatar Sep 28 '25 07:09 kindaro

Hi @kindaro I'm ramping up as the new maintainer of cassava. I'll make the change over the next two days but feel free to send a PR if you need this sooner.

mchav avatar Sep 28 '25 11:09 mchav