doc-en icon indicating copy to clipboard operation
doc-en copied to clipboard

bind_textdomain_codeset() return value updates

Open orlitzky opened this issue 1 year ago • 9 comments

bind_textdomain_codeset() is documented to return a string on success, and false on failure. But there is one success case that also returns false. If you query the codeset for a domain that has not been explicitly set (yet), you will also get false:

php > var_dump(bind_textdomain_codeset("foo", NULL));
bool(false)

This is because the C function returns NULL to indicate that the locale's codeset will be used. Quoting from https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/bind_textdomain_codeset.html

If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.

I bring this up because musl only supports UTF-8, and always returns NULL from its bind_textdomain_codeset(). Its PHP counterpart therefore always appears to fail, when it is working as... well, not quite expected, but as documented.

To summarize:

  • false will also be returned if no codeset has been explicitly set for the domain
  • this function always returns false under musl libc where the only supported codeset is UTF-8

orlitzky avatar Dec 15 '24 00:12 orlitzky

POSIX is pretty clear here:

If domainname is a null pointer, textdomain() shall return a pointer to the string containing the current text domain.

and

If domainname is a null pointer or an empty string, bindtextdomain() shall make no changes and return a null pointer without changing errno.

If musl (or any other implementation) return NULL here, they are not conforming to the specification. And that errno is unchanged, is to be expected. What is PHP supposed to do? Just say "yeah, the call was successful" would be wrong.

cmb69 avatar Dec 15 '24 10:12 cmb69

I'm only talking about bind_textdomain_codeset. There's a separate paragraph about its return value:

If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.

orlitzky avatar Dec 15 '24 15:12 orlitzky

Okay, but than musl could (and likely should) return UTF-8.

cmb69 avatar Dec 15 '24 15:12 cmb69

Okay, but than musl could (and likely should) return UTF-8.

This was my first thought, too, but if you want to entertain some wild speculation, I think the reasoning is:

  • Consistently returning "UTF-8" would do something equivalent to the right thing, but not quite the right thing itself, in the case where the codeset has not been set yet.
  • If you are handling the codeset-not-set-yet NULL correctly, the current behavior of always returning NULL is also "equivalent to correct"
  • Keeping track of whether or not the codeset has been set yet in some global state would be extra headache when, in either case, we are doing something equivalent to correct
  • NULL is simpler than "UTF-8"

So, we wind up with a consistent NULL return.

I think this would be a lot more sensible after https://github.com/php/php-src/issues/17163, but I filed them separately because changing the implementation is a lot harder to do than updating the docs to reflect reality. FWIW you can get a "failure" from musl, but only via errno. This is the entire implementation:

char *bind_textdomain_codeset(const char *domainname, const char *codeset)
{
        if (codeset && strcasecmp(codeset, "UTF-8"))
                errno = EINVAL;
        return NULL;
}

orlitzky avatar Dec 15 '24 15:12 orlitzky

bind_textdomain_codeset() is documented to return a string on success, and false on failure

Well, for me it just says "A string on success." That needs to fixed anyway. And yeah, should explicitly document what is returned if the codeset has not been set yet (currently, false).

cmb69 avatar Dec 15 '24 15:12 cmb69

I'm confused by the spec, as it seems to contradict itself:

 * If domainname is a null pointer or an empty string, bind_textdomain_codeset() shall make no changes and return a null pointer without changing errno.
 * Otherwise, if codeset is a null pointer:
   * If domainname is not bound, the function shall return the implementation-defined default codeset used by the gettext family of functions

But also:

If codeset is a null pointer and domainname is a non-empty string, bind_textdomain_codeset() shall return the current codeset for the named domain, or a null pointer if a codeset has not yet been set.

The terms "If domainname is not bound" and "if a codeset has not yet been set" appear to be equivalent, based on the rest of the spec.

If we consider the first quote, bind_textdomain_codeset(domainname, NULL) is supposed to return the default codeset when no codeset was set before.

If we consider the second quote, it's supposed to return NULL in the same case.

The RETURN VALUE section agrees with the second quote:

A call to the bind_textdomain_codeset() function with a non-empty domainname argument shall return one of the following:

    [...]

    A null pointer without changing errno if no codeset has yet been bound for that text domain

The GNU gettext manpage agrees with the second quote as well:

If no codeset has been set for domain domainname, it returns NULL.

So the second quote is probably the right one.

Given that setting the codeset always fails on Musl (kind of, as it doesn't set errno), then Musl is actually correct to always return NULL from bind_textdomain_codeset(domainname, NULL).

So it would be enough to just document that false will also be returned if no codeset has been explicitly set for the domain.

arnaud-lb avatar Dec 17 '24 15:12 arnaud-lb

I'm confused by the spec, as it seems to contradict itself:

Indeed. This is new in POSIX 2024, and all of the docs I can find for the various implementations (solaris, gnu, musl, freebsd) have it returning NULL, so I wonder where the "implementation defined codeset" part came from.

orlitzky avatar Dec 17 '24 15:12 orlitzky

I got nothing but a headache trying to sign up for the POSIX bug tracker / mailing list, but I did find the email address of a technical editor and sent him a note.

orlitzky avatar Dec 17 '24 16:12 orlitzky

Success: https://austingroupbugs.net/view.php?id=1894

orlitzky avatar Dec 17 '24 19:12 orlitzky