ICU-22843 Enable constructing UnicodeString from literal in fixed time
When passing a string literal to any of the legacy constructors that take just a plain pointer to a UTF-16 string it becomes necessary to iterate through the string to find its length, even though this length was known to the compiler (which just has no way of passing it on to the constructor).
But when calling the new templated string view constructor instead it becomes possible for the compiler to use the known length of a string literal to directly create a string view of the correct size and pass this on to the constructor.
By replacing the legacy constructors with the new constructor this is made the default behaviour.
Because non-templated functions take priority it's necessary to replace the legacy constructors or else the new constructor would never be called for any data type that can implicitly decay into a pointer (and this includes string literals). For the time being, this is gated on U_HIDE_DRAFT_API which will select either the old or the new constructors and even though the old constructors technically get deleted the new constructor is made possible to call in identical ways so that no changes to any call sites are necessary.
There is however a snag there and that's an old promise that nullptr is valid input and identical to the default constructor (ie. the empty string) but this is explicitly not valid input for string views, so this requires a backward compatibility shim to catch any nullptr before any attempt is made to use it to create a string view (which would crash most implementations of the standard library).
Checklist
- [x] Required: Issue filed: https://unicode-org.atlassian.net/browse/ICU-22843
- [x] Required: The PR title must be prefixed with a JIRA Issue number.
- [x] Required: The PR description must include the link to the Jira Issue, for example by completing the URL in the first checklist item
- [x] Required: Each commit message must be prefixed with a JIRA Issue number.
- [x] Issue accepted (done by Technical Committee after discussion)
- [ ] Tests included, if applicable
- [ ] API docs and/or User Guide docs changed or added, if applicable
Please take a look at this.
I have intentionally not done anything about the documentation yet, thinking that it'll be a better use of time to first make sure that we all agree that this change should be made at all, and if so, that the implementation is sound.
As can be seen directly in this PR no existing call sites should need to be changed, the updated new templated constructor will accept all existing callers with legacy behaviour kept intact, so I think that the change could be deemed acceptable for the ICU API stability promises. But it must be mentioned that the old constructors really do get removed so in case anyone anywhere has code that does something like taking the address of one of the old constructors then that code will cease to work after this change. As the change is gated on U_HIDE_DRAFT_API there should be ample time to find and address any such unusual usage, if any actually exists at all.
Are you planning to do the same treatment for the other functions that have overloads for both raw-pointer and template-S?
I haven't made any such plans for I haven't really looked at those functions at all (so I don't know if there's anything that would need to be done there).
Notice: the branch changed across the force-push!
- icu4c/source/common/unicode/unistr.h is different
~ Your Friendly Jira-GitHub PR Checker Bot
PS: Please don't squash until after approval, except when you need to rebase.
@eggrobin, @richgillam as you were the reviewers of PR #3076, I'd like to know your opinion on these proposed changes here.
I'd like to know your opinion on these proposed changes here.
I have opined. Apologies for the delayed review, I was on vacation last week. Other than the points discussed above, this seems like an improvement (and reduces the number of constructors, except for the draft transition period).
Reminder...
Are you planning to do the same treatment for the other functions that have overloads for both raw-pointer and template-S?
I haven't made any such plans for I haven't really looked at those functions at all (so I don't know if there's anything that would need to be done there).
Reminder...
I just think that it's a good idea to 1) make sure that we all agree that this should be done at all and 2) agree on how exactly to do it, before spending any time on doing any further changes.
Suggestion:
- drop the additional handling of uint16_t in char16ptr.h
- keep the addition of toU16StringViewNullable
- I don't feel strongly about keeping or reverting the changes from checking U_SIZEOF_WCHAR_T==2 in the template logic vs. in the preprocessor
- keep the replacement of (fn(ptr) + fn(const S&) --> fn(const S &)+nullable
Note: I looked at the other overloads that I added, and AFAICT none of them has a sibling overload that just takes a pointer, so this might be it for now.
@richgillam heads-up...
If there's any reason for why any of this wouldn't work for Apple, now would be a very good time to learn about that.
Are you planning to do the same treatment for the other functions that have overloads for both raw-pointer and template-S?
I've now searched through the entire file without finding any such functions.
OK, it seems like we've reached some kind of silent consensus here, so I've now updated the documentation comments and made this PR ready to merge (if the proposed API change gets approved by the TC).
Notice: the branch changed across the force-push!
- icu4c/source/common/unicode/char16ptr.h is different
- icu4c/source/common/unicode/unistr.h is different
~ Your Friendly Jira-GitHub PR Checker Bot
@richgillam heads-up...
If there's any reason for why any of this wouldn't work for Apple, now would be a very good time to learn about that.
See my comment above. Apple does still define UChar as uint16_t. I tried taking that out a while ago and got all kinds of bug reports. But it sounds like maybe Robin's objections are theoretical, at least in Apple's case, so maybe we leave what Fredrik is doing the way it is?
See my comment above. Apple does still define
UCharasuint16_t. I tried taking that out a while ago and got all kinds of bug reports. But it sounds like maybe Robin's objections are theoretical, at least in Apple's case, so maybe we leave what Fredrik is doing the way it is?
...And now I get to the bottom and am reminded that this was already merged. I lost track-- did we decide to leave the uint_16 stuff in there?
I lost track-- did we decide to leave the
uint_16stuff in there?
It's in, but https://github.com/unicode-org/icu/pull/3144 makes it conditional, removing it starting with libc++ 18. Making a string_view over uint16_t generates warnings with libc++ 18 and no longer works at all with libc++ 19.
It should continue to work if you use a standard library that still supports std::basic_string_view<uint16_t> -- but the standard does not condone it.
I lost track-- did we decide to leave the
uint_16stuff in there?It's in, but #3144 makes it conditional, removing it starting with libc++ 18. Making a string_view over uint16_t generates warnings with libc++ 18 and no longer works at all with libc++ 19.
It should continue to work if you use a standard library that still supports
std::basic_string_view<uint16_t>-- but the standard does not condone it.
Okay. Sounds like it's time for me to figure out how to get everybody to change to using char16_t...