avro-schema
avro-schema copied to clipboard
Support UTF-8 in record names, field names and enums
- Should we check utf-8 validity?
- I think yes, because is seems that there are no way to ban certain symbols in encoding-unaware way.
- But once we checked it is valid utf8 we can still use built-in regexps (it allows to don't rewrite internals a lot).
- Should we check for some symbols like period or zero byte?
- Period at least, see, say, fullname (frontend.lua).
- How to better organize this feature with utf8_enums flag?
- I think we should just keep this flag and prefer this behaviour when both flags are provided. But the deletion unlikely will hurt anyone.
- Use tarantool facilities for identifiers?
- No cost way: don't use tarantool identifiers, don't perform any validity check.
- Use tarantool identifiers. It seems to be the good way. There are two
possible approaches (both requires new utf8 module):
- Add forbidden symbols into identifier_check* and expose identifier.c into Lua (add to utf8 module).
- Expose identifier.c into Lua (add to utf8 module) and perform the identifier traversal using utf8.next for forbidden symbols.
Blocked by: https://github.com/tarantool/tarantool/issues/3405
The feature is to enable under flag, because of the spec compatibility.