tinyusdz icon indicating copy to clipboard operation
tinyusdz copied to clipboard

[Enhancement] Support UTF8 Identifier

Open syoyo opened this issue 2 years ago • 3 comments

TinyUSDZ now support UTF-8 Identifier by default in dev branch.

https://github.com/PixarAnimationStudios/USD/pull/2120

https://openusd.org/release/api/_usd__page__u_t_f_8.html

  • [x] Basic UTF-8 processing
    • https://github.com/syoyo/tinyusdz/blob/dev/src/str-util.hh
  • [x] Support UTF-8 string in identifier(e.g. Prim name)
    • [x] UTF8 Default Identifier validation support: https://github.com/syoyo/tinyusdz/commit/de6b5456297aa01e1640948bd1b39bbbb536eeb3
  • [x] UTF-8 support in isValidIdentifier https://github.com/syoyo/tinyusdz/blob/919fb30c4c34a5548ed2fcba058a262d81eb2a42/src/str-util.hh#L253
  • [ ] MakeValidaIdentifier for UTF-8 Identifer?
  • [ ] Support sorting UTF-8 string
  • [ ] Provide utf8 normalization support?(if we do, use utf8proc.h https://github.com/JuliaStrings/utf8proc)
    • pxrUSD recommends to use NFC, but NFKC is preferred for multilingual environment. Thus TinyUSDZ would recommend NFKC for normalization.

syoyo avatar Dec 04 '22 19:12 syoyo

For efficient UTF8 processing...

Validating UTF-8 In Less Than One Instruction Per Byte https://arxiv.org/abs/2010.03090

https://github.com/simdjson/simdjson

syoyo avatar Feb 03 '23 13:02 syoyo

For Identifier with UTF8, we need to follow Xid 31 rule

https://www.unicode.org/reports/tr31/

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p1949r7.html

syoyo avatar Apr 04 '24 18:04 syoyo

Added initial UAX31 Default Identifier validation support in https://github.com/syoyo/tinyusdz/commit/de6b5456297aa01e1640948bd1b39bbbb536eeb3

syoyo avatar Apr 05 '24 22:04 syoyo