icu4x
icu4x copied to clipboard
First Implementation for Person Name Formatter.
#3077
initial version.
I updated the branch to resolve conflicts. I will go ahead and push some additional changes directly to the branch. Then maybe we can get close to merging this :)
Working on getting std-related errors resolved. Next:
error[E0152]: found duplicate lang item `panic_impl`
--> ffi/capi/src/lib.rs:51:1
|
51 | fn panic(_info: &core::panic::PanicInfo) -> ! {
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: the lang item is first defined in crate `std` (which `regex` depends on)
= note: first definition in `std` loaded from /usr/local/google/home/sffc/.rustup/toolchains/1.75-x86_64-unknown-linux-gnu/lib/rustlib/x86_64-unknown-linux-gnu/lib/libstd-90f6ddbf82de36ec.rlib
= note: second definition in the local crate (`icu_capi`)
I guess in general how should we handle static regexes? @robertbastian opinions?
The regex crate apparently used to have a regex! macro to compile-time create a regex, but that seems to have disappeared. This can be solved with a custom proc macro if nothing else exists.
const_regex is prior art, although not directly usable as we need regex-automata 0.2.
Actually looking at how these regexes are used in more detail, they are parsing a fixed pattern string like "{0}." which we should be doing in datagen anyway and returning a pattern. So I think the solution here is "don't use regexes in this context".
But on the subject of const regexes, for future posterity: probably what we should do here is get the sparse DFA regex bytes from regex_automata, store them somewhere (possibly hard-coded), and construct the regex from them. We could make a macro_rules that takes in the source regex and the bytes and has a unit test that the bytes are the proper regex, and then we can use from_bytes_unchecked.
@robertbastian suggested an alternate way to make the PR mergeable is to gate the personnames module with the std feature
I got rid of some of the regexes. To get rid of the last one, we need MultiPlaceholderPattern.
Big milestone: finally finished migrating all the regexes to icu_pattern. Still some internal cleanup work, but the PR is no longer blocked. I would like to merge as soon as CI comes back green.
I'm going to merge this into icu_experimental and we can keep iterating from there.