doc-en icon indicating copy to clipboard operation
doc-en copied to clipboard

`strcmp` falsely claims it returns a 0, 1 or -1

Open ash-m opened this issue 1 year ago • 1 comments

The docs for strcmp say:

Return Values

Returns -1 if string1 is less than string2; 1 if string1 is greater than string2, and 0 if they are equal.

Changelog

Version Description
8.2.0 This function now returns -1 or 1, where it previously returned a negative or positive integer.

Problems:

  • The "Return Values" part is patentenly not true: https://3v4l.org/CXhkE#v8.2.0
  • I believe the "Changelog" is refering to when a string is compared to an empty string: https://3v4l.org/UHXcr

I believe the intended logic here is:

  • Attempt to calculate the spread between to strings (unpredictable for large spreads)
  • If the strings match except for trailing characters, return the difference in characters

Whatever the logic, it seems like it does not return very predicable results other than being contrasted to 0. Here are some interesting return values: https://3v4l.org/FN86s

ash-m avatar Jul 29 '24 15:07 ash-m

See https://github.com/php/doc-en/issues/3223. While this has been fixed in the migration guide (but not in UPGRADING), the fix apparently missed the changelog entries.

cmb69 avatar Jul 29 '24 15:07 cmb69

In addition to strcmp() and substr_compare() previously mentioned, the following can be added to the set of misleading documentations:

  • strcasecmp()
  • strncmp()
  • strncasecmp()

https://3v4l.org/Kumk4

mickmackusa avatar Dec 01 '24 20:12 mickmackusa

I don't think there's any intent for these return values to be predictable or meaningful, only useful as a callback for functions like usort. The documentation should simply say:

Returns a value less than 0 if string1 is less than string2; a value more than 0 if string1 is greater than string2, and 0 if they are equal.

IMSoP avatar Dec 02 '24 12:12 IMSoP

I don't think there's any intent for these return values to be predictable or meaningful, only useful as a callback for functions like usort. The documentation should simply say:

Returns a value less than 0 if string1 is less than string2; a value more than 0 if string1 is greater than string2, and 0 if they are equal.

To be honest, I didn't even know they had any meaning. I thought it was just a random number.

kamil-tekiela avatar Dec 02 '24 12:12 kamil-tekiela

To be honest, I didn't even know they had any meaning. I thought it was just a random number.

I think the accurate word would be arbitrary rather than random: there is no attempt to avoid meaning, the values are just a side effect of the underlying implementation. Specifically, they are probably the result of subtracting one value from another, which is generally an efficient way of comparing them - if the result is 0, they're equal; otherwise, the sign of the result indicates which is larger.

IMSoP avatar Dec 02 '24 12:12 IMSoP

I agree with @IMSoP. But see https://github.com/php/php-src/pull/8220 which introduced the incomplete normalization to -1/0/1.

cmb69 avatar Dec 02 '24 12:12 cmb69

Maybe we should just revert https://github.com/php/doc-en/pull/1857, as this might be the wrong one. The changes were intended just for length comparison: https://3v4l.org/dLXlq

drealecs avatar Dec 02 '24 13:12 drealecs

When used as the callback of array_uintersect*() or array_udiff*(), is there a performance benefit to the under-the-hood sorting if the return values of strcmp() (and other related comparison functions) can be beyond -1, 0, 1?

Are there potentially fewer ties to break if a greater range of positive and negative return values is possible?

Mick

mickmackusa avatar Dec 11 '24 12:12 mickmackusa