bug: queries don't seem support non-ascii chars
If I tag a page with "élément" (French for "element"), it seems I can't use it in queries because the parser fails on accentuated characters. This query will be marked as incorrect (but it's not clear to me whether it can't be parsed or interpreted) :
```query
élément
select name
```
Same if I'm using any accentuated attribute in a page; they can't be used in the where sections of queries later.
I took a crack at this, and found what seems to be a plausible solution, only it still didn't work (and I lost the code between other PRs). I'll write what I found out, for my future reference or anyone else trying:
The query syntax is defined here, and we see that it starts with a TagIdentifier, because we start query with a tag to look for:
https://github.com/silverbulletmd/silverbullet/blob/1635c417c3d925ff0766756eb5b063d8233878f4/common/markdown_parser/query.grammar#L23
What is allowed in a tag identifier is defined lower, I think that's what's rejecting letters with diacritics: https://github.com/silverbulletmd/silverbullet/blob/1635c417c3d925ff0766756eb5b063d8233878f4/common/markdown_parser/query.grammar#L128
I checked in lezer docs that there isn't anything like @unicodeLetters (as we have in regex with \p{L}), but the next best thing I found is defining ranges of code points, like they do here:
https://github.com/lezer-parser/lezer-grammar/blob/64e55bd774a17e47fb600983b1f5390a11025562/src/lezer.grammar#L152
However the \u{a1}-\u{10ffff} cannot be copied directly, because this includes other whitespace characters and breaks the grammar parsing.
I tried changing this grammar, updating the files with scripts/generate.sh, and rebuilding the server but still keep seeing "Parse error". Is there a better way to debug than this? Lezer forum only agrees it's hard
Now I remember: this change does work, but not when the first letter is also non-ASCII. The grammar after the patch should allow it, probably there's some regex somewhere?
Also, this works without any patches:
```query
page where tags = "élément"
```
I think the same issue applies to attributes:
Non-ASCII attributes:
- All ASCII [works: true]
- Other letters [działa: false]
The whole system SHOULD support Unicode scalar values everywhere where it is applicable (anywhere where users may input something), IMHO.
This got fixed as a side effect of Lua Integrated Query.
The screenshot was done from this page:
#elément
${query[[
from index.tag "elément"
select {
Nazwa = name,
["Długość"] = size}
]]}
The table columns which contain non-ASCII characters using the general form for table constructor, but this is standard Lua. Probably worth including into the documentation, but I'm not sure what would be the best place.