cassava icon indicating copy to clipboard operation
cassava copied to clipboard

Field parsers do no check for end of input

Open Shimuuar opened this issue 10 years ago • 4 comments

In numeric fields it could lead to silent truncation:

>>> import Data.Csv
>>> import Data.Vector (Vector)
>>> decode NoHeader  "12.7" :: Either String (Vector (Only Int))
Right (fromList [Only {fromOnly = 12}])

Or worse:

>>> decode NoHeader  "1e6" :: Either String (Vector (Only Int))
Right (fromList [Only {fromOnly = 1}])

Also it would be good idea to allow trailing (and probably leading) whitespaces since those are frequently added for readability in hand-written CSV

Shimuuar avatar Apr 20 '14 15:04 Shimuuar

Good catch. This is a bit odd since decode eventually ends up calling the parser called csv, defined in Encoding.hs:

csv :: FromRecord a => DecodeOptions -> AL.Parser (V.Vector a)
csv !opts = do
    vals <- records
    _ <- optional endOfLine
    endOfInput
    return $! V.fromList vals
  where
    records = do
        !r <- record (decDelimiter opts)
        if blankLine r
            then (endOfLine *> records) <|> pure []
            else case runParser (parseRecord r) of
                Left msg  -> fail $ "conversion error: " ++ msg
                Right val -> do
                    !vals <- (endOfLine *> records) <|> pure []
                    return (val : vals)

Note the call to endOfInput. Perhaps that attoparsec endOfInput combinator is broken. Alternatively perhaps the decimal combinator in attoparsec consumes the whole literal, even if it ignores part of it. I will have to investigate.

tibbe avatar Apr 20 '14 15:04 tibbe

No decimal doesn't do anything like that

Prelude Data.Attoparsec Data.Attoparsec.ByteString.Char8> parse decimal "12.7"
Done ".7" 12

Shimuuar avatar Apr 20 '14 16:04 Shimuuar

From what I can tell, all of the field parsing functions in Conversion.hs are using the attoparsec parseOnly function which silently ignores unconsumed input. The FromField parsers should be changed to use a full attoparsec parse and fail if the result isn't Done with an empty leftover string.

The documentation for the attoparsec parseOnly function neglects to mention that it succeeds even if there's left over input. But it's pretty clear from the source code:

parseOnly :: Parser a -> B.ByteString -> Either String a
parseOnly m s = case T.runParser m (I s) mempty Complete failK successK of
                  Fail _ _ err -> Left err
                  Done _ a     -> Right a
                  _            -> error "parseOnly: impossible error!"

pjones avatar Apr 29 '14 19:04 pjones

Great detective work. We should indeed switch to using parse then. I've filed a bug with attoparsec to get the docs fixed.

tibbe avatar Apr 29 '14 20:04 tibbe