alex icon indicating copy to clipboard operation
alex copied to clipboard

Feature request: provide Template Haskell quasiquoter that generates lexer definition in-place

Open sergv opened this issue 8 years ago • 1 comments
trafficstars

Currently alex is a command-line tool that takes files in its own format and relies on support from Cabal-the-library to invoke alex command in order to preprocess .x files before passing them to ghc. This works reasonably well, but I think there's something to be gained if alex package would provide quasiquoter that takes lexer definition in the currently used format but spits out TH expressions instead of generating new file.

E.g. take tokens_scan_user.x test from the alex test suite. Instead of

{
module Main (main) where
import System.Exit
}

%wrapper "basic" -- Defines: AlexInput, alexGetByte, alexPrevChar

$digit = 0-9
$alpha = [a-zA-Z]
$ws    = [\ \t\n]

tokens :-

  5 / {\ u _ibt _l _iat -> u == FiveIsMagic} { \s -> TFive (head s) }
  $digit { \s -> TDigit (head s) }
  $alpha { \s -> TAlpha (head s) }
  $ws    { \s -> TWSpace (head s) }

{

data Token = TDigit Char
           | TAlpha Char
           | TWSpace Char
           | TFive Char -- Predicated only
           | TLexError
    deriving (Eq,Show)

data UserLexerMode = NormalMode
                   | FiveIsMagic
    deriving Eq

main | test1 /= result1 = exitFailure
     | test2 /= result2 = exitFailure
     -- all succeeded
     | otherwise        = exitWith ExitSuccess

run_lexer :: UserLexerMode -> String -> [Token]
run_lexer m s = go ('\n', [], s)
    where go i@(_,_,s') = case alexScanUser m i 0 of
                     AlexEOF             -> []
                     AlexError  _i       -> [TLexError]
                     AlexSkip   i' _len  ->                   go i'
                     AlexToken  i' len t -> t (take len s') : go i'

test1 = run_lexer FiveIsMagic "5 x"
result1 = [TFive '5',TWSpace ' ',TAlpha 'x']

test2 = run_lexer NormalMode "5 x"
result2 = [TDigit '5',TWSpace ' ',TAlpha 'x']
}

I'd like to write TokensScanUser.hs file that looks like:

{-# LANGUAGE QuasiQuotes #-}

module Main (main) where
import System.Exit

import Alex.TH

genLexer defaultLexer [alex|
%wrapper "basic" -- Defines: AlexInput, alexGetByte, alexPrevChar

$digit = 0-9
$alpha = [a-zA-Z]
$ws    = [\ \t\n]

tokens :-

  5 / {\u _ibt _l _iat -> u == FiveIsMagic} { \s -> TFive (head s) }
  $digit { \s -> TDigit (head s) }
  $alpha { \s -> TAlpha (head s) }
  $ws    { \s -> TWSpace (head s) }
|]

data Token = TDigit Char
           | TAlpha Char
           | TWSpace Char
           | TFive Char -- Predicated only
           | TLexError
    deriving (Eq,Show)

data UserLexerMode = NormalMode
                   | FiveIsMagic
    deriving Eq

main | test1 /= result1 = exitFailure
     | test2 /= result2 = exitFailure
     -- all succeeded
     | otherwise        = exitWith ExitSuccess

run_lexer :: UserLexerMode -> String -> [Token]
run_lexer m s = go ('\n', [], s)
    where go i@(_,_,s') = case alexScanUser m i 0 of
                     AlexEOF             -> []
                     AlexError  _i       -> [TLexError]
                     AlexSkip   i' _len  ->                   go i'
                     AlexToken  i' len t -> t (take len s') : go i'

test1 = run_lexer FiveIsMagic "5 x"
result1 = [TFive '5',TWSpace ' ',TAlpha 'x']

test2 = run_lexer NormalMode "5 x"
result2 = [TDigit '5',TWSpace ' ',TAlpha 'x']

Having such quasiquoter will provide following benefits:

  • This will provide an option to use alex independent of system's preprocessor if, say, clang starts to behave funny
  • This can help end confusion of text editors with overly long lines as mentioned #84
  • There will be no nowelines in the generated files (becase there will be no generated files) that may annoy someone #105
  • User will be able to just start ghci in his or hers project and load lexer definition to play with - no need to add dist/build/ directory (or different directory, depending on build target and the build tool (cabal has one prefix here, stack has another depending on the snapshot))to ghci path any more
  • Indexing with e.g. tag generators should also improve because these programs will just skip quasiquoter part and index all user-defined functions.

sergv avatar Jun 17 '17 13:06 sergv

Yes, this would be a great feature to have. Other people have done it before, e.g. I just found this: http://hackage.haskell.org/package/alex-meta

There are probably others. Maybe there already exists a good starting point?

simonmar avatar Jun 19 '17 08:06 simonmar