phpstan-doctrine
phpstan-doctrine copied to clipboard
Strings not allowed in QueryBuilder::where parameters
Shouldn't strings be allowed in the QueryBuilder::where parameters?
$qb->select(['u'])
->from(User::class, 'u')
->where(
$qb->expr()->isNotNull('u.foo')
)
I think the example above is a valid usecase and the isNotNull method returns string. But in the QueryBuilder stub string is not allowed (https://github.com/phpstan/phpstan-doctrine/commit/5a5454cf8d7916b39c6423beed592fb141743937#diff-197b8a38b2ae878d60d27968735dd0b7208e015f401be1b2c5761add8fe8b85dR25)
What error are you seeing?
Parameter #1 $predicates of method Doctrine\ORM\QueryBuilder::where() expects array|object|literal-string, string given.
/cc @craigfrancis We should stub these Expr methods to also return literal-string.
The isNotNull() method is very simple in what it does:
/**
* Creates an IS NOT NULL expression with the given arguments.
*
* @param string $x Field in string format to be restricted by IS NOT NULL.
*
* @return string
*/
public function isNotNull($x)
{
return $x . ' IS NOT NULL';
}
So the input $x
would need to be a literal-string
for it to return a literal-string
, which I think is reasonable.
Having a quick look at the other methods in Expr
that @return string
, I should do:
- countDistinct($x)
- isNull($x)
- isNotNull($x)
- between($val, $x, $y)
In the future I will be looking at the other Expr\*
objects (e.g. Expr\Comparison), but those shouldn't be affected by the where()
requirement of accepting literal-string|object|array<mixed>
.
So the input $x would need to be a literal-string for it to return a literal-string, which I think is reasonable.
Yeah, needs a dynamic return type extension then.
@ondrejmirtes; Am I going in the right direction with this patch?
Not sure if I sure if I should be returning an IntersectionType
like this (got it from ConstantStringType).
And should this check bleedingEdge
?
@petaak, do you have any thoughts on this alternative approach; where this is fine:
$qb->expr()->between('a.published', '"2022-01-01"', '"2022-01-31"')
But as soon as I do something wrong, e.g.
$unsafe_user_data = (string) ($_GET['unsafe_user_data'] ?? '"2022-01-31"');
$qb->expr()->between('a.published', '"2022-01-01"', $unsafe_user_data);
While my patch will correctly complain when a non-literal-string
is given to QueryBuilder::where()
, I'm wondering if I should be updating the input types for these 4 methods, like ->between():
* @param mixed $val Valued to be inspected by range values.
* @param int|string $x Starting range value to be used in BETWEEN() function.
* @param int|string $y End point value to be used in BETWEEN() function.
*
* @return string A BETWEEN expression.
I could change them to:
* @param literal-string $val Valued to be inspected by range values.
* @param literal-string $x Starting range value to be used in BETWEEN() function.
* @param literal-string $y End point value to be used in BETWEEN() function.
*
* @return literal-string A BETWEEN expression.
But I am wondering why $val
is set to mixed
- maybe a stringable value-object, like Func?
And $x
and $y
, where they accept an integer or a generic string; this kind of abstraction worries me, because I suspect some developers will accidentally pass unsafe user values into the SQL (i.e. not using ->setParameter()
).
I'm sorry for my late answer. Yes, this patch is fine, if you add some tests :) Thanks.
Thanks @ondrejmirtes;
I've been thinking about this a lot, and because these functions return DQL (like SQL), I'm fairly sure their inputs should be checked as literal-string
's, so errors can be easily identified (i.e. as soon as they are made)... I do have some boring work first, then I'll do some tests for both approaches (and hopefully @petaak can confirm which would be the best route).
So we have two options:
- Make the methods accept literal-string and return literal-string (update the PHPDoc stub)
- Continue accepting
string
and returnstring
via PHPDocs, but write a dynamic return type extension that will make them returnliteral-string
whenliteral-string
is passed to them.
Since these methods do not execute DQL/SQL directly, I think option number 2 is fine.
Both certainly work, and I'm happy to do either, I'm just thinking about how easy it would be to find/fix issues (something Dan Ackroyd is quite passionate about).
Please forgive the contrived example, but if you had this code:
$condition = NULL;
$max_dob = (string) ($_GET['max_dob'] ?? '');
if ($max_dob) {
$condition = $qb->expr()->between('u.dob', "'2000-01-01'", "'" . $max_dob . "'"); // INSECURE
}
$min_age = (string) ($_GET['min_age'] ?? '');
$max_age = (string) ($_GET['max_age'] ?? '');
if ($min_age && $max_age) {
$condition = $qb->expr()->between($qb->expr()->abs('u.age'), $min_age, $max_age); // INSECURE
}
if ($condition) {
$results = $qb
->select(['u'])
->from(User::class, 'u')
->where($condition)
}
The first option (PHPDoc stub) will highlight the three issues on line 5 (x1) and line 11 (x2); the second option (dynamic return type) will only highlight a problem on line 18.
(as an aide, I know this is on the simpler side, but I'm always amazed at how PHPStan can follow all of my weird tests/experiments).
I understand your line of reasoning but I'd still like the option 2. Because:
- Who am I to judge the user whether they're calling
$qb->expr()->between
to create part of a DQL query, of if they're calling it for some other reason where it's perfectly fine to have just astring
not aliteral-string
. - You showed an example where the user would benefit from the calls on line 5 and 11 to be marked as wrong. But there are thousands of similar situations in PHPStan where a similar thing happens and is not related to literal-string, so I don't think we need to focus on this specific use-case, but rather improve PHPStan itself to handle all of these situations better by telling the user where the type comes from perhaps.
Thanks @ondrejmirtes, that's fair, and Dan is "Not interested", so I'll complete my patch with some tests (might be a couple of days, due to work things).
I've added some tests, but I'm not completely sure about this yet (thoughts welcome), because these functions can also accept stringable objects.
I can't think of any realistic examples at the moment, so I've gone with MOD() and ABS():
$qb->expr()->isNull($qb->expr()->mod('field', '0'));
// "MOD(field, 0) IS NULL"
$qb->expr()->between($qb->expr()->abs('field'), '10', '30')
// "ABS(field) BETWEEN 10 AND 30"
Because the example inputs to isNull()
and between()
include stringable value-objects, I'm not completely comfortable saying isNull()
or between()
return a literal-string
in this case (and the PR doesn't).
I'm wondering if the Expr\Func
value-objects could be marked as having been created with literal-string
inputs? This is on the basis that __toString() is used to simply return the concatenated inputs.
This is similar with other functions, like $qb->expr()->eq('field', '123')
, which returns a Expr\Comparison
with it's own __toString(), so it can be passed as an object to where()
(which is why $predicates currently accepts any object, i.e. literal-string|object|array<mixed>
).
With the unit tests, QueryBuilderDqlRuleSlowTest
and QueryBuilderDqlRuleTest
, they use qbExprIsNull and qbExprIsNullSyntaxError, which use isNull()
, and now report "Could not analyse QueryBuilder with dynamic arguments". Should I try to bypass that check (maybe update QueryBuilderDqlRule.php), or use a different function (e.g. for the syntax error one, use ->like('e.nickname', 'x')
), or do something else?