dbal icon indicating copy to clipboard operation
dbal copied to clipboard

Return numeric-string instead of just string to represent row count

Open simPod opened this issue 3 years ago • 23 comments

Q A
Type improvement
Fixed issues Follows up https://github.com/doctrine/dbal/pull/5269

Summary

The returned string representing row count is always a number so it makes sense to constrain it a little bit more.

simPod avatar Mar 10 '22 07:03 simPod

Looks good in general. I believe, the same idea applies to the QueryBuilder and the Driver\Connection interface and its implementations as well:

  1. https://github.com/doctrine/dbal/blob/7ac309579ac0a14e525144229c8feebf79e8d1e7/src/Query/QueryBuilder.php#L332
  2. https://github.com/doctrine/dbal/blob/7ac309579ac0a14e525144229c8feebf79e8d1e7/src/Query/QueryBuilder.php#L318
  3. https://github.com/doctrine/dbal/blob/e4df1a5834fb78ea2e58c4803b5b8934911561e6/src/Driver/Connection.php#L47

morozov avatar Mar 10 '22 16:03 morozov

Thanks for input, will adapt the PR

On Thu, Mar 10, 2022, 17:24 Sergei Morozov @.***> wrote:

Looks good in general. I believe, the same idea applies to the QueryBuilder and the Driver\Connection interface and its implementations as well:

https://github.com/doctrine/dbal/blob/7ac309579ac0a14e525144229c8feebf79e8d1e7/src/Query/QueryBuilder.php#L332 2. https://github.com/doctrine/dbal/blob/7ac309579ac0a14e525144229c8feebf79e8d1e7/src/Query/QueryBuilder.php#L318 3. https://github.com/doctrine/dbal/blob/e4df1a5834fb78ea2e58c4803b5b8934911561e6/src/Driver/Connection.php#L47

— Reply to this email directly, view it on GitHub https://github.com/doctrine/dbal/pull/5317#issuecomment-1064249810, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQAJJIVKZVLC4RJXAW4EDU7IOZZANCNFSM5QL2EVVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

simPod avatar Mar 10 '22 16:03 simPod

@morozov I believe changing

- public function exec(string $sql): int; 
+ public function exec(string $sql): int|string; 

results in hard BC break since I'll have to remove the type hint (union types not supported here yet)

- public function exec(string $sql): int; 
+ public function exec(string $sql) 

Should I proceed?

simPod avatar Mar 10 '22 22:03 simPod

No, but it should be possible to add a @return or @psalm-return annotation that would override the native return type.

morozov avatar Mar 10 '22 23:03 morozov

Ok so I keep :int but add @return int|string?

On Fri, Mar 11, 2022, 00:43 Sergei Morozov @.***> wrote:

No, but it should be possible to add a @return or @psalm-return annotation that would override the native return type.

— Reply to this email directly, view it on GitHub https://github.com/doctrine/dbal/pull/5317#issuecomment-1064627434, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACQAJN46WWYAHTSZ47TU23U7KCJVANCNFSM5QL2EVVQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you authored the thread.Message ID: @.***>

simPod avatar Mar 10 '22 23:03 simPod

Ok so I keep :int but add @return int|string?

Yes, keep :int and add whatever (@return or @psalm-return) that will make static analyzers understand that it's in fact int|numeric-string.

Off-topic: note that it's not just int but in fact 0|positive-int (see https://github.com/phpstan/phpstan-src/pull/692 for example). While it's not currently necessary, it's not clear how to express a positive-negative-integer-string.

morozov avatar Mar 11 '22 00:03 morozov

@morozov what about PHPDoc tag @return with type int|string is not subtype of native type int. phpstan errors?

simPod avatar Mar 23 '22 10:03 simPod

[…] what about PHPDoc tag @return with type int|string is not subtype of native type int. phpstan errors?

Would it help to remove the conflicting @return annotations and only leave the @psalm-return ones which don't seem to cause any issues?

morozov avatar Mar 23 '22 18:03 morozov

Would it help to remove the conflicting @return annotations and only leave the @psalm-return ones which don't seem to cause any issues?

IMO this would not be good. If the PHP return type is : int then PHP will do type coercion so in the example of executeUpdate if the inner executeStatement returns a numeric-string because it overflowed PHP_INT_MAX, then it doesn't matter what your phpdoc tags say, it will try to return an int and it will fail (see https://3v4l.org/YaedP)

The only correct way here that I see is to remove the int return type from exec&executeUpdate IMO, and then use @return int|string everywhere for dumb parsers plus ideally @psalm-return int<0, max>|numeric-string for static analyzers.

Removing the type hint should not be a huge deal, classes extending it can still specify it without breaking LSP (at least in the 3.3 branch as this was allowed in PHP 7.2) so it's not a BC break I would say.

Seldaek avatar Mar 23 '22 21:03 Seldaek

Note that the DBAL 3.x codebase doesn't use declare(strict_types=1) in most of its codebase. So right now the int return type also serves as an implicit cast of the strings potentially returned by the underlying drivers. I don't know if any driver returns the number of affected rows lower than PHP_INT_MAX as a string, but if they do, it might be a bigger BC break. I.e. the consuming code may expect an int but will now get a string regardless of the actual value.

Another thought is that it was never reported that a number of affected rows larger than PHP_INT_MAX couldn't be properly reported. Even on a 32-bit platform, it would have to be over 2 billion rows, which is quite a lot.

It is much more likely that changing the signature will affect the API consumers who don't experience this issue than it will fix the issue for those who experience it.

I am fine with introducing this minor BC break in a minor release but we have to be conscious about the real side effects it can lead to.

morozov avatar Mar 23 '22 22:03 morozov

Note that the DBAL 3.x codebase doesn't use declare(strict_types=1) in most of its codebase. So right now the int return type also serves as an implicit cast of the strings potentially returned by the underlying drivers.

If you look at my example (https://3v4l.org/YaedP) again, it also does not use strict types, and while it also surprised me, it seems like PHP fails the string->int cast if the numeric string is beyond PHP_MAX_INT. So technically you already have the issue with the current code.

But as you say, in practice even on 32bit doing an operation on 2billion rows is pretty damn rare, and 32bit systems themselves are getting rarer, so IMO it's fine to use a plain int return type here, and coerce whatever numeric string we get back to an int. I find this completely reasonable, but I do wonder why not apply the same logic to the other statements then?

If adding return types would be considered a BC break, we can still do return (int) $whatever; to explicitly cast.

I think this would be way less surprising for everyone, and lets you use a much leaner @return int<0, max> return type.

The bigint -> string type thing is a bit of a pain already, but I can understand the rationale there (tho I wish we could forget about 32bit), and having tables with ids going above 2billion is definitely not that far fetched. But again for affected rows a 32bit integer really sounds like it is good enough.

Seldaek avatar Mar 24 '22 08:03 Seldaek

But again for affected rows a 32bit integer really sounds like it is good enough.

Agreed, plus the only 32bit system I had to recently work with is a raspberry pi 😁

Ocramius avatar Mar 24 '22 08:03 Ocramius

Note that the DBAL 3.x codebase doesn't use declare(strict_types=1) in most of its codebase.

Actually, all the classes implementing Driver\Result::rowCount(): int have declare(strict_types=1) in their files, which means that all drivers return an int for the values not greater than PHP_INT_MAX.

The only correct way here that I see is to remove the int return type from exec&executeUpdate IMO, and then use @return int|string everywhere for dumb parsers plus ideally @psalm-return int<0, max>|numeric-string for static analyzers.

Agree.

Removing the type hint should not be a huge deal, classes extending it can still specify it without breaking LSP (at least in the 3.3 branch as this was allowed in PHP 7.2) so it's not a BC break I would say.

Agree. It won't be a BC break.

morozov avatar Mar 24 '22 21:03 morozov

Actually, all the classes implementing Driver\Result::rowCount(): int have declare(strict_types=1) in their files, which means that all drivers return an int for the values not greater than PHP_INT_MAX.

I am a bit confused, if that is the case already then why not keep the int return types on Connection and simply drop the string types?

If Driver\Result::rowCount(): int have declare(strict_types=1) already, it means that for int<0,PHP_INT_MAX> they will return an int, and if the php drivers return a string for values above then rowCount will throw a TypeError.

So having an int|string return type on Connection makes no sense as it will never return string.

As per https://github.com/doctrine/dbal/blob/83f779beaea1893c0bece093ab2104c6d15a7f26/src/Connection.php#L1157-L1160 the other way executeStatement can return is via DriverConnection::exec but that also has an int return type, so it is currently impossible for executeStatement to return a string.

I opened https://github.com/doctrine/dbal/pull/5334 to show more clearly what I suggested doing above in https://github.com/doctrine/dbal/pull/5317#issuecomment-1077349918

IMO this is the best outcome as it completely removes the string type, keeping things simpler for the API consumers, and it is anyway reflecting the current reality of the code better.

Seldaek avatar Mar 25 '22 07:03 Seldaek

I am a bit confused, if that is the case already then why not keep the int return types on Connection and simply drop the string types?

Because the underlying driver can return a string.

morozov avatar Mar 25 '22 13:03 morozov

Yes there is one driver (MySQLi) which can return a string as far as I have seen after looking at all drivers docs this morning. And it will do so only in the case that you have a query affecting over 2billion rows (super rare) AND only on 32bit platforms (super rare). I don't find it justified to add a string return type just for that one edge case, adding burden to every API consumer, but that's obviously just my perspective.

Also note that in the current state, this PR does not fix it as https://github.com/doctrine/dbal/blob/3.3.x/src/Driver/Mysqli/Result.php#L164-L171 and other Result classes still have the int return type, which will turn any overflowing int into a TypeError. So you will right now never get a string back, and the type def is too broad. You also never got anyone reporting such a TypeError I suppose, because as I mentioned above this is a very low probability edge case.

I would appeal to pragmatism here as forcing everyone to deal with int|string when reality is 99.9999999% of users will ever see an int, seems absurd.

P.S: Also another minor aspect I noticed this morning is that strictly speaking some drivers (SQLSrv and Mysqli as per https://github.com/doctrine/dbal/pull/5334/files) do return -1 in some error conditions, which does not fit the int<0, max> type. I don't think this can happen because if it errors most likely Doctrine won't create a Result, but for safety I added a max(0, $...) in that PR, ensuring the returned values match the documented types.

Seldaek avatar Mar 25 '22 14:03 Seldaek

I realize that this is such an insignificant and at the same time such a complex problem that I'm not willing to spend time on solving it.

It is insignificant because the edge case is indeed rare. It is complex because different underlying drivers are handling it differently. I.e. SQL Server seems to be unable to return more than PHP_INT_MAX rows (not sure about handling the updates). Oracle seems to be returning uint as int which might lead to a negative number at the end.

I don't believe we have the integration tests that validate the semantics of these values in all cases (e.g. select, insert, update where all matching rows indeed were updated or some of them already had the value set by the update).

With that said, whoever has the commit permissions for this repo has my full blessing in doing whatever they deem reasonable.

I would appeal to pragmatism here as forcing everyone to deal with int|string when reality is 99.9999999% of users will ever see an int, seems absurd.

This appeal should be targeted at the maintainers of the underlying drivers first of all. BTW @kamil-tekiela contributed some changes that allowed to simplify the DBAL code for mysqli recently.

morozov avatar Mar 25 '22 15:03 morozov

OK, then if you rather not solve anything, I would suggest leaving the php types as they are now, broken or not that they may be in edge cases, and switch to a simple @return int instead of int|string. That would not require any code change. It also reflects reality because drivers will return an int or throw a TypeError right now (as of 3.3.x code state I mean) in pathologic cases.

Removing the int<0, max> part makes it simpler due to drivers returning -1 and the fact that these driver methods are not properly typed in psalm/phpstan yet either..

If you agree I am happy to prepare a new PR (or @simPod updates this one here?). It'd be risk free in terms of code, but improves the type situation for API consumers which I believe was @simPod's intent as well.

Seldaek avatar Mar 25 '22 16:03 Seldaek

There hasn't been any activity on this pull request in the past 90 days, so it has been marked as stale and it will be closed automatically if no further activity occurs in the next 7 days. If you want to continue working on it, please leave a comment.

github-actions[bot] avatar Jun 24 '22 03:06 github-actions[bot]

This pull request was closed due to inactivity.

github-actions[bot] avatar Jul 01 '22 03:07 github-actions[bot]

@doctrine what do you think about @Seldaek' comment

simPod avatar Jul 01 '22 06:07 simPod

Yeah this should be reopened IMO. @morozov if you can let us know what you think about https://github.com/doctrine/dbal/pull/5317#issuecomment-1079180493 I'll happily send a PR.

Seldaek avatar Jul 01 '22 08:07 Seldaek

I'll be fine with whatever changes that somebody else from the maintainers approves. After skimming over the thread, I don't want to work on this change.

morozov avatar Jul 01 '22 15:07 morozov

Closing due to the lack of progress.

morozov avatar Sep 18 '22 19:09 morozov

What is your feedback on https://github.com/doctrine/dbal/pull/5334 then?

simPod avatar Sep 18 '22 19:09 simPod

I don't have any feedback on in, it's closed.

morozov avatar Sep 18 '22 19:09 morozov