kgt
kgt copied to clipboard
Exponential grow for case insensitive input.
Hi, Thanks for this tool. This is awesome!
I noticed some generator produce all the possibles combinaisons for case incentive input. It means this generate a list of item growing exponentially relative to the input length.
For instance if you use: (input: abnf)
test = "test"
Then you get your get. (output: "rrta")
add('test', Diagram(
Choice(0,
Terminal("test"),
Terminal("Test"),
Terminal("tEst"),
Terminal("TEst"),
Terminal("teSt"),
Terminal("TeSt"),
Terminal("tESt"),
Terminal("TESt"),
Terminal("tesT"),
Terminal("TesT"),
Terminal("tEsT"),
Terminal("TEsT"),
Terminal("teST"),
Terminal("TeST"),
Terminal("tEST"),
Terminal("TEST"))));
I was curious to know if this is something you want to keep or fix later. If you want to fix it later, would you prefer I try to submit PR?
I tried all the generator online using this tool https://arthursonzogni.com/Diagon/#code_area
exponential output:
- rrparcon
- rrl
- rrta
non exponential output:
- rrutf8
- rrtext
- svg
- html5
- xhtml5
- ebnfhtml5
- ebnfxhtml5
- blab
- abnf
- dot
- rdot
- rrdump
- rrtdump
Can't say (reached unimplemented)
- bnf
- wsn
- iso-ebnf
- rbnf
- sid
This is pretty terrible, I agree. I was trying to think about better ways to handle this, when I wrote it. Do you have any suggestions please?
The problem is that those dialects don't support case insensitive terminals, so we have to expand out every spelling.
I was thinking: instead of outputting every combination, to rewrite it using a sequence of alternatives:
Input:
http
output (current)
| http
| Http
| hTtp
| HTtp
| htTp
| HtTp
| hTTp
| HTTp
| httP
| HttP
| hTtP
| HTtP
| htTP
| HtTP
| hTTP
| HTTP
output (proposed;)
(h | H) (t | T) (t | T) (p | P)
Obviously, this is a very naive idea from me. I guess there are reasons this can't be done. I just found out this is implemented by permute_case
. I will take a look.
Yes, unfortunately that has a different meaning. For a grammar where there could be whitespace between tokens (depending on how lexing is defined), here you're introducing the possibility for space between each letter, and to produce multiple tokens for output. The original grammar there has only one token for the entire string.