phpstan-doctrine icon indicating copy to clipboard operation
phpstan-doctrine copied to clipboard

Strings not allowed in QueryBuilder::where parameters

Open petaak opened this issue 3 years ago • 14 comments

Shouldn't strings be allowed in the QueryBuilder::where parameters?

$qb->select(['u'])
   ->from(User::class, 'u')
   ->where(
       $qb->expr()->isNotNull('u.foo')
   )

I think the example above is a valid usecase and the isNotNull method returns string. But in the QueryBuilder stub string is not allowed (https://github.com/phpstan/phpstan-doctrine/commit/5a5454cf8d7916b39c6423beed592fb141743937#diff-197b8a38b2ae878d60d27968735dd0b7208e015f401be1b2c5761add8fe8b85dR25)

petaak avatar Feb 23 '22 12:02 petaak

What error are you seeing?

ondrejmirtes avatar Feb 23 '22 12:02 ondrejmirtes

Parameter #1 $predicates of method Doctrine\ORM\QueryBuilder::where() expects array|object|literal-string, string given.

petaak avatar Feb 23 '22 13:02 petaak

/cc @craigfrancis We should stub these Expr methods to also return literal-string.

ondrejmirtes avatar Feb 23 '22 14:02 ondrejmirtes

The isNotNull() method is very simple in what it does:

    /**
     * Creates an IS NOT NULL expression with the given arguments.
     *
     * @param string $x Field in string format to be restricted by IS NOT NULL.
     *
     * @return string
     */
    public function isNotNull($x)
    {
        return $x . ' IS NOT NULL';
    }

So the input $x would need to be a literal-string for it to return a literal-string, which I think is reasonable.

Having a quick look at the other methods in Expr that @return string, I should do:

  1. countDistinct($x)
  2. isNull($x)
  3. isNotNull($x)
  4. between($val, $x, $y)

In the future I will be looking at the other Expr\* objects (e.g. Expr\Comparison), but those shouldn't be affected by the where() requirement of accepting literal-string|object|array<mixed>.

craigfrancis avatar Feb 23 '22 15:02 craigfrancis

So the input $x would need to be a literal-string for it to return a literal-string, which I think is reasonable.

Yeah, needs a dynamic return type extension then.

ondrejmirtes avatar Feb 23 '22 15:02 ondrejmirtes

@ondrejmirtes; Am I going in the right direction with this patch?

Not sure if I sure if I should be returning an IntersectionType like this (got it from ConstantStringType).

And should this check bleedingEdge?


@petaak, do you have any thoughts on this alternative approach; where this is fine:

$qb->expr()->between('a.published', '"2022-01-01"', '"2022-01-31"')

But as soon as I do something wrong, e.g.

$unsafe_user_data = (string) ($_GET['unsafe_user_data'] ?? '"2022-01-31"');

$qb->expr()->between('a.published', '"2022-01-01"', $unsafe_user_data);

While my patch will correctly complain when a non-literal-string is given to QueryBuilder::where(), I'm wondering if I should be updating the input types for these 4 methods, like ->between():

     * @param mixed      $val Valued to be inspected by range values.
     * @param int|string $x   Starting range value to be used in BETWEEN() function.
     * @param int|string $y   End point value to be used in BETWEEN() function.
     *
     * @return string A BETWEEN expression.

I could change them to:

     * @param literal-string $val Valued to be inspected by range values.
     * @param literal-string $x   Starting range value to be used in BETWEEN() function.
     * @param literal-string $y   End point value to be used in BETWEEN() function.
     *
     * @return literal-string A BETWEEN expression.

But I am wondering why $val is set to mixed - maybe a stringable value-object, like Func?

And $x and $y, where they accept an integer or a generic string; this kind of abstraction worries me, because I suspect some developers will accidentally pass unsafe user values into the SQL (i.e. not using ->setParameter()).

craigfrancis avatar Feb 23 '22 18:02 craigfrancis

I'm sorry for my late answer. Yes, this patch is fine, if you add some tests :) Thanks.

ondrejmirtes avatar Mar 02 '22 09:03 ondrejmirtes

Thanks @ondrejmirtes;

I've been thinking about this a lot, and because these functions return DQL (like SQL), I'm fairly sure their inputs should be checked as literal-string's, so errors can be easily identified (i.e. as soon as they are made)... I do have some boring work first, then I'll do some tests for both approaches (and hopefully @petaak can confirm which would be the best route).

craigfrancis avatar Mar 02 '22 18:03 craigfrancis

So we have two options:

  1. Make the methods accept literal-string and return literal-string (update the PHPDoc stub)
  2. Continue accepting string and return string via PHPDocs, but write a dynamic return type extension that will make them return literal-string when literal-string is passed to them.

Since these methods do not execute DQL/SQL directly, I think option number 2 is fine.

ondrejmirtes avatar Mar 03 '22 07:03 ondrejmirtes

Both certainly work, and I'm happy to do either, I'm just thinking about how easy it would be to find/fix issues (something Dan Ackroyd is quite passionate about).

Please forgive the contrived example, but if you had this code:

$condition = NULL;

$max_dob = (string) ($_GET['max_dob'] ?? '');
if ($max_dob) {
  $condition = $qb->expr()->between('u.dob', "'2000-01-01'", "'" . $max_dob . "'"); // INSECURE
}

$min_age = (string) ($_GET['min_age'] ?? '');
$max_age = (string) ($_GET['max_age'] ?? '');
if ($min_age && $max_age) {
  $condition = $qb->expr()->between($qb->expr()->abs('u.age'), $min_age, $max_age); // INSECURE
}

if ($condition) {
  $results = $qb
     ->select(['u'])
     ->from(User::class, 'u')
     ->where($condition)
}

The first option (PHPDoc stub) will highlight the three issues on line 5 (x1) and line 11 (x2); the second option (dynamic return type) will only highlight a problem on line 18.

(as an aide, I know this is on the simpler side, but I'm always amazed at how PHPStan can follow all of my weird tests/experiments).

craigfrancis avatar Mar 03 '22 14:03 craigfrancis

I understand your line of reasoning but I'd still like the option 2. Because:

  • Who am I to judge the user whether they're calling $qb->expr()->between to create part of a DQL query, of if they're calling it for some other reason where it's perfectly fine to have just a string not a literal-string.
  • You showed an example where the user would benefit from the calls on line 5 and 11 to be marked as wrong. But there are thousands of similar situations in PHPStan where a similar thing happens and is not related to literal-string, so I don't think we need to focus on this specific use-case, but rather improve PHPStan itself to handle all of these situations better by telling the user where the type comes from perhaps.

ondrejmirtes avatar Mar 03 '22 15:03 ondrejmirtes

Thanks @ondrejmirtes, that's fair, and Dan is "Not interested", so I'll complete my patch with some tests (might be a couple of days, due to work things).

craigfrancis avatar Mar 04 '22 11:03 craigfrancis

I've added some tests, but I'm not completely sure about this yet (thoughts welcome), because these functions can also accept stringable objects.

I can't think of any realistic examples at the moment, so I've gone with MOD() and ABS():

$qb->expr()->isNull($qb->expr()->mod('field', '0'));
// "MOD(field, 0) IS NULL"

$qb->expr()->between($qb->expr()->abs('field'), '10', '30')
// "ABS(field) BETWEEN 10 AND 30"

Because the example inputs to isNull() and between() include stringable value-objects, I'm not completely comfortable saying isNull() or between() return a literal-string in this case (and the PR doesn't).

I'm wondering if the Expr\Func value-objects could be marked as having been created with literal-string inputs? This is on the basis that __toString() is used to simply return the concatenated inputs.

This is similar with other functions, like $qb->expr()->eq('field', '123'), which returns a Expr\Comparison with it's own __toString(), so it can be passed as an object to where() (which is why $predicates currently accepts any object, i.e. literal-string|object|array<mixed>).

craigfrancis avatar Mar 06 '22 12:03 craigfrancis

With the unit tests, QueryBuilderDqlRuleSlowTest and QueryBuilderDqlRuleTest, they use qbExprIsNull and qbExprIsNullSyntaxError, which use isNull(), and now report "Could not analyse QueryBuilder with dynamic arguments". Should I try to bypass that check (maybe update QueryBuilderDqlRule.php), or use a different function (e.g. for the syntax error one, use ->like('e.nickname', 'x')), or do something else?

craigfrancis avatar Mar 06 '22 15:03 craigfrancis