sedlex
sedlex copied to clipboard
match default (_) doesn't capture anything
I'm not sure if it is intentional (in which case it should be documented) or not, but if you match _, the lexeme you get is empty (that is, ""). This means you can't, for example, use the default as a way to catch bad characters and report what they were, you need to match "any" for that.
This seems like a slightly odd choice to me, but again, it probably should be either documented (and explained) or changed.
I think this is consistent with the semantics of formal languages. The largest language includes the empty word "". One would expect a wildcard (i.e., _
) to match on anything, including the empty word.
The thing that may be a bit subtle is that _
matches lazily, rather than greedily. That is, _
will match the smallest possible word in the "full language", which is always "".
I'm not sure I love the behavior, but it should at least be documented I think.
(I don't love = matching empty means that you need to do something unusual to match unexpected characters, and that "_" isn't very useful.)