cassava
cassava copied to clipboard
Broken parser error messages
Currently, decode returns FromRecord a => Either String a. This is suboptimal, because some error messages include the row that failed - and if that row is utf-8 encoded and contains japanese for example, the error message becomes an unreadable mess of characters. This is the bad code:
https://github.com/hvr/cassava/blob/545b86d60276c51ec29681f1917f1e5fb9b67c54/Data/Csv/Encoding.hs#L340-L343
(Specifically BL8.unpack)
Resolution options:
- As we already assume that the input is
utf-8(seeTextFromFieldinstance), instead ofBL8.unpackwe could try to decode utf8 first and if that fails fall back toBL8.unpack - The left
Eitherpart should be a byte string
What do you think?
I think we should instead have Either (String, ByteString) a: return both the error message and the corresponding raw Field/Row that caused the issue.