icu4x icon indicating copy to clipboard operation
icu4x copied to clipboard

First Implementation for Person Name Formatter.

Open dsipasseuth opened this issue 2 years ago • 6 comments

#3077

initial version.

dsipasseuth avatar Sep 18 '23 16:09 dsipasseuth

I updated the branch to resolve conflicts. I will go ahead and push some additional changes directly to the branch. Then maybe we can get close to merging this :)

sffc avatar Feb 01 '24 01:02 sffc

Working on getting std-related errors resolved. Next:

error[E0152]: found duplicate lang item `panic_impl`
  --> ffi/capi/src/lib.rs:51:1
   |
51 | fn panic(_info: &core::panic::PanicInfo) -> ! {
   | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: the lang item is first defined in crate `std` (which `regex` depends on)
   = note: first definition in `std` loaded from /usr/local/google/home/sffc/.rustup/toolchains/1.75-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-90f6ddbf82de36ec.rlib
   = note: second definition in the local crate (`icu_capi`)

I guess in general how should we handle static regexes? @robertbastian opinions?

sffc avatar Feb 01 '24 03:02 sffc

The regex crate apparently used to have a regex! macro to compile-time create a regex, but that seems to have disappeared. This can be solved with a custom proc macro if nothing else exists.

robertbastian avatar Feb 01 '24 10:02 robertbastian

const_regex is prior art, although not directly usable as we need regex-automata 0.2.

robertbastian avatar Feb 01 '24 10:02 robertbastian

Actually looking at how these regexes are used in more detail, they are parsing a fixed pattern string like "{0}." which we should be doing in datagen anyway and returning a pattern. So I think the solution here is "don't use regexes in this context".

But on the subject of const regexes, for future posterity: probably what we should do here is get the sparse DFA regex bytes from regex_automata, store them somewhere (possibly hard-coded), and construct the regex from them. We could make a macro_rules that takes in the source regex and the bytes and has a unit test that the bytes are the proper regex, and then we can use from_bytes_unchecked.

sffc avatar Feb 01 '24 22:02 sffc

@robertbastian suggested an alternate way to make the PR mergeable is to gate the personnames module with the std feature

sffc avatar Feb 08 '24 16:02 sffc

I got rid of some of the regexes. To get rid of the last one, we need MultiPlaceholderPattern.

sffc avatar Mar 30 '24 07:03 sffc

Big milestone: finally finished migrating all the regexes to icu_pattern. Still some internal cleanup work, but the PR is no longer blocked. I would like to merge as soon as CI comes back green.

sffc avatar Apr 26 '24 00:04 sffc

I'm going to merge this into icu_experimental and we can keep iterating from there.

sffc avatar Apr 26 '24 20:04 sffc