sedlex Compl could work with non single-character length regexp

Hello, I would like to know if there would be any reticence to add the possibility to use Compl with a non single-character length regexp ?

May 24 '21 21:05 EmileRolley

@EmileRolley, can you explain the semantic of such construct ?

Feb 25 '23 21:02 hhugo

I need this also. Not aware of how sedlex is implemented, but haven't found a good way to represent Compl with a secession of chars.

The exact code I have is a lexer to represent a printf-similar syntax but instead of %d being the interpolation token, I designed $() and inside the parens it contains the value.

The regexpes are defined like this:

let letter = [%sedlex.regexp? 'a' .. 'z' | 'A' .. 'Z']

  let case_ident =
    [%sedlex.regexp?
      ('a' .. 'z' | '_' | '\''), Star (letter | '0' .. '9' | '_')]

  let ident = [%sedlex.regexp? (letter | '_'), Star (letter | '0' .. '9' | '_')]
  let variable = [%sedlex.regexp? Star (ident, '.'), case_ident]
  let interpolation = [%sedlex.regexp? "$(", variable, ")"]
  let rest = [%sedlex.regexp? Plus (Compl '$')]

I would like to define rest as [%sedlex.regexp? Plus (Compl "$(")]

Since currently, as soon as there's one $ (wihtout a () I can't handle it on rest

Oct 20 '23 15:10 davesnx

Can you write what you want with Compl '$' | ( '$', Compl '(') ?

Oct 20 '23 20:10 hhugo

Totally, but I'm not entirely sure If you can make this structure work since combining Compl '$' with '$' generates a non-matchable rule, right?

Oct 21 '23 15:10 davesnx

Can you write what you want with Compl '$' | ( '$', Compl '(') ?

Yes, that's what I ended up doing and it would be convenient to have the possibility to simply write Compl "mystring" instead of Compl 'm' | ( 'm', Compl 'y') | ("my", Compl "s") | ... | ("mystrin", Compl "g").

Oct 26 '23 18:10 EmileRolley

This doesn't seem to be the way you are supposed to write a tokenizer in general, or really what regular expressions are for as such...

Dec 04 '23 15:12 pmetzger

This doesn't seem to be the way you are supposed to write a tokenizer in general, or really what regular expressions are for as such...

What is the way ?

Dec 05 '23 19:12 EmileRolley

I don't know what your whole language is, but the reasonable thing to do is to write a grammar for it and to tokenize lexemes that you see rather than trying to not tokenize lexemes you don't see.

Dec 06 '23 02:12 pmetzger

sedlex sedlex copied to clipboard

Compl could work with non single-character length regexp

sedlex
sedlex copied to clipboard