sedlex
sedlex copied to clipboard
Compl could work with non single-character length regexp
Hello, I would like to know if there would be any reticence to add the possibility to use Compl with a non single-character length regexp ?
@EmileRolley, can you explain the semantic of such construct ?
I need this also. Not aware of how sedlex is implemented, but haven't found a good way to represent Compl with a secession of chars.
The exact code I have is a lexer to represent a printf-similar syntax but instead of %d being the interpolation token, I designed $() and inside the parens it contains the value.
The regexpes are defined like this:
let letter = [%sedlex.regexp? 'a' .. 'z' | 'A' .. 'Z']
let case_ident =
[%sedlex.regexp?
('a' .. 'z' | '_' | '\''), Star (letter | '0' .. '9' | '_')]
let ident = [%sedlex.regexp? (letter | '_'), Star (letter | '0' .. '9' | '_')]
let variable = [%sedlex.regexp? Star (ident, '.'), case_ident]
let interpolation = [%sedlex.regexp? "$(", variable, ")"]
let rest = [%sedlex.regexp? Plus (Compl '$')]
I would like to define rest as [%sedlex.regexp? Plus (Compl "$(")]
Since currently, as soon as there's one $ (wihtout a () I can't handle it on rest
Can you write what you want with Compl '$' | ( '$', Compl '(') ?
Totally, but I'm not entirely sure If you can make this structure work since combining Compl '$' with '$' generates a non-matchable rule, right?
Can you write what you want with
Compl '$' | ( '$', Compl '(')?
Yes, that's what I ended up doing and it would be convenient to have the possibility to simply write Compl "mystring" instead of Compl 'm' | ( 'm', Compl 'y') | ("my", Compl "s") | ... | ("mystrin", Compl "g").
This doesn't seem to be the way you are supposed to write a tokenizer in general, or really what regular expressions are for as such...
This doesn't seem to be the way you are supposed to write a tokenizer in general, or really what regular expressions are for as such...
What is the way ?
I don't know what your whole language is, but the reasonable thing to do is to write a grammar for it and to tokenize lexemes that you see rather than trying to not tokenize lexemes you don't see.