bind_textdomain_codeset() return value updates
bind_textdomain_codeset() is documented to return a string on success, and false on failure. But there is one success case that also returns false. If you query the codeset for a domain that has not been explicitly set (yet), you will also get false:
php > var_dump(bind_textdomain_codeset("foo", NULL));
bool(false)
This is because the C function returns NULL to indicate that the locale's codeset will be used. Quoting from https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/bind_textdomain_codeset.html
If codeset is a null pointer and domainname is a non-empty string,
bind_textdomain_codeset()shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.
I bring this up because musl only supports UTF-8, and always returns NULL from its bind_textdomain_codeset(). Its PHP counterpart therefore always appears to fail, when it is working as... well, not quite expected, but as documented.
To summarize:
falsewill also be returned if no codeset has been explicitly set for the domain- this function always returns
falseunder musl libc where the only supported codeset is UTF-8
POSIX is pretty clear here:
If domainname is a null pointer, textdomain() shall return a pointer to the string containing the current text domain.
and
If domainname is a null pointer or an empty string, bindtextdomain() shall make no changes and return a null pointer without changing errno.
If musl (or any other implementation) return NULL here, they are not conforming to the specification. And that errno is unchanged, is to be expected. What is PHP supposed to do? Just say "yeah, the call was successful" would be wrong.
I'm only talking about bind_textdomain_codeset. There's a separate paragraph about its return value:
If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.
Okay, but than musl could (and likely should) return UTF-8.
Okay, but than musl could (and likely should) return
UTF-8.
This was my first thought, too, but if you want to entertain some wild speculation, I think the reasoning is:
- Consistently returning "UTF-8" would do something equivalent to the right thing, but not quite the right thing itself, in the case where the codeset has not been set yet.
- If you are handling the codeset-not-set-yet
NULLcorrectly, the current behavior of always returningNULLis also "equivalent to correct" - Keeping track of whether or not the codeset has been set yet in some global state would be extra headache when, in either case, we are doing something equivalent to correct
NULLis simpler than "UTF-8"
So, we wind up with a consistent NULL return.
I think this would be a lot more sensible after https://github.com/php/php-src/issues/17163, but I filed them separately because changing the implementation is a lot harder to do than updating the docs to reflect reality. FWIW you can get a "failure" from musl, but only via errno. This is the entire implementation:
char *bind_textdomain_codeset(const char *domainname, const char *codeset)
{
if (codeset && strcasecmp(codeset, "UTF-8"))
errno = EINVAL;
return NULL;
}
bind_textdomain_codeset()is documented to return a string on success, andfalseon failure
Well, for me it just says "A string on success." That needs to fixed anyway. And yeah, should explicitly document what is returned if the codeset has not been set yet (currently, false).
I'm confused by the spec, as it seems to contradict itself:
* If domainname is a null pointer or an empty string, bind_textdomain_codeset() shall make no changes and return a null pointer without changing errno.
* Otherwise, if codeset is a null pointer:
* If domainname is not bound, the function shall return the implementation-defined default codeset used by the gettext family of functions
But also:
If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.
The terms "If domainname is not bound" and "if a codeset has not yet been set" appear to be equivalent, based on the rest of the spec.
If we consider the first quote, bind_textdomain_codeset(domainname, NULL) is supposed to return the default codeset when no codeset was set before.
If we consider the second quote, it's supposed to return NULL in the same case.
The RETURN VALUE section agrees with the second quote:
A call to the bind_textdomain_codeset() function with a non-empty domainname argument shall return one of the following:
[...]
A null pointer without changing errno if no codeset has yet been bound for that text domain
The GNU gettext manpage agrees with the second quote as well:
If no codeset has been set for domain domainname, it returns NULL.
So the second quote is probably the right one.
Given that setting the codeset always fails on Musl (kind of, as it doesn't set errno), then Musl is actually correct to always return NULL from bind_textdomain_codeset(domainname, NULL).
So it would be enough to just document that false will also be returned if no codeset has been explicitly set for the domain.
I'm confused by the spec, as it seems to contradict itself:
Indeed. This is new in POSIX 2024, and all of the docs I can find for the various implementations (solaris, gnu, musl, freebsd) have it returning NULL, so I wonder where the "implementation defined codeset" part came from.
I got nothing but a headache trying to sign up for the POSIX bug tracker / mailing list, but I did find the email address of a technical editor and sent him a note.
Success: https://austingroupbugs.net/view.php?id=1894