parsec icon indicating copy to clipboard operation
parsec copied to clipboard

identifiers in Text.Parsec.Language not polymorphic enough

Open jwaldmann opened this issue 8 years ago • 9 comments

we have

emptyDef :: LanguageDef st 
type LanguageDef st = GenLanguageDef String st Identity
makeTokenParser :: Stream s m Char => GenLanguageDef s u m -> GenTokenParser s u m 

This means that this can only be used for String parsers, and the following is ill-typed (whatever the contents of { ... }

lexer :: GenTokenParser ByteString () Identity
lexer = makeTokenParser $ emptyDef { ... }

jwaldmann avatar Jun 09 '16 01:06 jwaldmann

I was also hit by this when updating an old library using String to use Text. Would be great to have this.

mcfilib avatar Jun 27 '16 13:06 mcfilib

Why not to open a PR fixing it?

mrkkrp avatar Jun 27 '16 14:06 mrkkrp

well there are some design choices, e.g.:

  • should LanguageDef get another type parameter? (this breaks code)
  • should there be some LanguageDef' with the more general type? (but then how should it be called)
  • should emptyDef (etc.) get the more general type, or should it be emptyDef',
  • or should the generalized things keep names, but go into another module (which?)

jwaldmann avatar Jun 27 '16 18:06 jwaldmann

Here's one way to fix this:

  1. Keep the current LanguageDef and TokenParser definitions (or deprecate them eventually if they appear not be useful anymore).
  2. Generalize the exports of Text.Parsec.Language to GenLanguageDef s u m or GenTokenParser s u m.

This should allow at least some users to upgrade without needing to update their code.

OTOH, considering that there are several other problems with Text.Parsec.Language (#89, #93), it might make more sense to develop the current contents of Text.Parsec.Language in a separate package to avoid friction with parsec's otherwise slowish churn rate.

What do you think, @hvr?


Also, @mrkkrp, is there any megaparsec-based code that corresponds to (some of) Text.Parsec.Language.

sjakobi avatar May 16 '18 15:05 sjakobi

See lexer modules, that’s closest you get. But I think they are written from scratch, I didn’t like Parsec’s take on this.

mrkkrp avatar May 16 '18 15:05 mrkkrp

Adding a specific use case where this caused problems for me. I ran into this issue trying to use the Language module to parse strings in use with s-cargot. cc @aisamanra I'm not sure if there is a "better" way I should be doing. Specifically, the following code doesn't compile because of String/Text mismatch:

data Atom = ASymbol T.Text deriving (Show, Eq)

-- s-cargot parser
myParser = mkParser parseString

parseString :: Parser Atom
parseString = AString . T.pack <$> stringLiteral lexer

-- `String` here needs to be `Text`, but that doesn't match the return type of `makeTokenParser` so doesn't compile.
lexer :: GenTokenParser String u Identity
lexer = makeTokenParser haskellDef

Sorry I'm not going to be much use coming up with a solution, but thanks every who has spent brain cycles on this! :)

xaviershay avatar Jun 02 '18 16:06 xaviershay

@mrkkrp searching for "haskell text.parsec lexer modules" didn't uncover anything that looks obviously relevant. Do you have a specific link in mind?

xaviershay avatar Jun 02 '18 16:06 xaviershay

@xaviershay See Megaparsec.

mrkkrp avatar Jun 02 '18 17:06 mrkkrp

ah thanks. I wonder if that's API compatible with s-cargot? Will have a play. I mean, I could just implement sexpr parsing myself (or presumably copy an existing Megaparsec implementation), shouldn't be too difficult.

xaviershay avatar Jun 02 '18 18:06 xaviershay