tinyusdz
tinyusdz copied to clipboard
[Enhancement] Support UTF8 Identifier
TinyUSDZ now support UTF-8 Identifier by default in dev
branch.
https://github.com/PixarAnimationStudios/USD/pull/2120
https://openusd.org/release/api/_usd__page__u_t_f_8.html
- [x] Basic UTF-8 processing
- https://github.com/syoyo/tinyusdz/blob/dev/src/str-util.hh
- [x] Support UTF-8 string in identifier(e.g. Prim name)
- [x] UTF8 Default Identifier validation support: https://github.com/syoyo/tinyusdz/commit/de6b5456297aa01e1640948bd1b39bbbb536eeb3
- [x] UTF-8 support in isValidIdentifier https://github.com/syoyo/tinyusdz/blob/919fb30c4c34a5548ed2fcba058a262d81eb2a42/src/str-util.hh#L253
- [ ] MakeValidaIdentifier for UTF-8 Identifer?
- [ ] Support sorting UTF-8 string
- [ ] Provide utf8 normalization support?(if we do, use utf8proc.h https://github.com/JuliaStrings/utf8proc)
- pxrUSD recommends to use NFC, but NFKC is preferred for multilingual environment. Thus TinyUSDZ would recommend NFKC for normalization.
For efficient UTF8 processing...
Validating UTF-8 In Less Than One Instruction Per Byte https://arxiv.org/abs/2010.03090
https://github.com/simdjson/simdjson
For Identifier with UTF8, we need to follow Xid 31 rule
https://www.unicode.org/reports/tr31/
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1949r7.html
Added initial UAX31 Default Identifier validation support in https://github.com/syoyo/tinyusdz/commit/de6b5456297aa01e1640948bd1b39bbbb536eeb3