Codewords icon indicating copy to clipboard operation
Codewords copied to clipboard

adding unicode friendly clue parsing

Open Canonelis opened this issue 3 years ago • 6 comments

Changed the clue parsing algorithm to handle unicode characters. The regular expression it now mimics is ^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$ where "\a" represents all legal clue letters. I needed to avoid using any string library functions that allows matching using %d-style syntax when parsing a clue.

Canonelis avatar Jul 13 '21 11:07 Canonelis

So this allows all letters allowed by %a in lua scripting, but also allows any unicode characters above 0x0370 except for some whitespace characters and dashes. Very versatile and still allows for all the same clue formats as before.

Canonelis avatar Jul 13 '21 11:07 Canonelis

If you're busy I could provide a fairly exhaustive list of test cases. Anything I can do to help u add this to the project?

Canonelis avatar Jul 19 '21 07:07 Canonelis

Did some rigorous testing on it, found one flaw. Generated 2000 clues that should work and they did. Generated 5000 clues that shouldn't work and they didn't. This is ready.

Canonelis avatar Jul 27 '21 01:07 Canonelis

This would be good to add pretty soon since you have so many foreign decks. Right now the characters it allows in clues is fairly arbitrary. If the character's code mod 256 is in the range of A-Z or a-z or À-ÿ then it accepts it, otherwise it rejects it.

I've played a few games with it now and I think it's done.

Canonelis avatar Aug 06 '21 06:08 Canonelis

Here are 2 near legit clues(but not legit).txt legit clues.txt files you can copy and paste from. They each were randomly generated and filtered by the regular expression ^\s*\a+(-\a+)?(\s*-?\s*\d+|\s*(-|\s)\s*inf)\s*$ So with the allowed character sets, it gets pretty weird, but for testing purposes it worked great. There are the numbers 0-9 in many other languages, so I included them as well which is why you might not see a normal number in each clue. For displaying and logging the clue, however, it puts it in as a normal digit.

Canonelis avatar Aug 25 '21 21:08 Canonelis

Here are the submitted changes to the code.

Canonelis avatar Aug 26 '21 10:08 Canonelis