framework icon indicating copy to clipboard operation
framework copied to clipboard

[10.x] database expressions with grammar-specific formatting

Open tpetry opened this issue 3 years ago • 10 comments
trafficstars

Last year I proposed PR #40115 to change database expressions to be grammar-enhanced. But my PR was changing too much, so I redid the whole concept to change as least as possible to be easier to merge for the next Laravel major release.

What is the problem?

Laravel's database implementation provides us a good way of working with multiple databases while abstracting away the inner workings. We don't have to think about small syntax differences when using a query builder or how column names are escaped to not interfere with reserved keywords.

However, when we want to use more database functionality we have to fall back to raw database code and writing database-specific code to e.g. quote column names or column aliases:


DB::table('table')
    ->when(isPostgreSQL(), fn ($query) => $query->select(DB::raw('coalesce("user", "admin") AS "value"')))
    ->when(isMySQL(), fn ($query) => $query->select(DB::raw('coalesce(`user`, `admin`) AS `value`')))

How can this be solved?

Currently the Expression class returned by DB::raw is just a container to indicate that the value should not be further processed. My implementation changes that approach a little bit by declaring an expression contract DB::raw uses and custom classes can implement:

interface Expression
{
    /**
     * Get the value of the expression.
     *
     * @param  \Illuminate\Database\Grammar  $grammar
     * @return scalar
     */
    public function getValue(Grammar $grammar);
}

Following that idea, the operations normally done by raw expressions and statements can be created as reusable expression classes. The former example could be written like this, which is totally database-agnostic for the one using those classes instead of raw expressions:

DB::table('table')
    ->select(new Alias(new Coalesce(['user', 'admin']), 'count'));


class Alias implements Expression
{
    public function __construct(
    public readonly Expression|string $expression,
    public readonly string $name,
    ) { }

    public function getValue(Grammar $grammar): string
    {
        return match ($grammar->isExpression($this->expression)) {
            true => "{$grammar->getValue($this->expression)} as {$grammar->wrap($this->name)}",
            false => $grammar->wrap("{$this->name} as {$this->name}"),
        };
    }
}

class Coalesce implements Expression
{
    public function __construct(
        public readonly array $expressions,
    ) { }

    public function getValue(Grammar $grammar): string
    {
        $expressions = array_map(function ($expression) use($grammar): string {
            return match ($grammar->isExpression($expression)) {
                true => $grammar->getValue($expression),
                false => $grammar->wrap($expression),
            };
        }, $this->expressions);
        $expressions = implode(', ', $expressions);

        return "coalesce({$expressions})";
    }
}

But isn't it much work to write those classes?

Kind of. But keep in mind these have to be written only once and you don't have to be the one. Like with every package created by the community anyone can start and write those SQL expressions for anyone to use. When this PR will be merged into 10.x I will start by writing the most common ones at tpetry/laravel-query-expressions.

But I don't like the syntax

That's ok as I don't propose any syntax, I propose a system of how this can work. You can start a package that will provide a macro-able facade to do the same with static function calls as long as those calls return expression objects:

Expr::alias(Expr::coalesce(['user', 'admin']), 'count')

An implementation even doesn't have to use this wrapping. I can imagine a parser could transform a special little syntax to create those object chains from a string:

DB::table('table')
    ->select(exprParse('[user], [admin] ~> coalesce ~> alias:count'));

As said, this PR is not about providing syntax. It is about adding a building block packages can use to build something great

Outlook

In future PRs I would like to work on more things useful for database expressions:

  • Access to the database type:. Some functions are implemented differently by the databases. When the Grammar would provide methods to identify it as e.g. a MySQL grammar, the expression could use a different function for regex replacement than for e.g. PostgreSQL.
  • Value bindings: Right now request-provided values can't be added to expressions as the value could contain a SQL injection. When the Grammar would provide method to quote() a value these could be safely injected into an expression.
  • Database version: Some features are easily done on recent database versions as critical features have been added while a lot of boilerplate is needed for older releases. When the Grammar would provide the database type and version a grammar class could use best implementation for every database and version.

tpetry avatar Oct 30 '22 12:10 tpetry

This would be really nice in my opinion! This would make the DB expressions more typesafe.

Wulfheart avatar Oct 30 '22 13:10 Wulfheart

This is pretty slick, nicely done.

Two things:

First, is the new 'admin' in your example a typo, or some new PHP 8 syntax?

Second, should we preserve the __toString() in src/Illuminate/Database/Query/Expression.php for backwards compatibility?

michaeldyrynda avatar Oct 30 '22 20:10 michaeldyrynda

First, is the new 'admin' in your example a typo, or some new PHP 8 syntax?

That is a typo. I simplified the example to make it easier to understand and missed the new keyword.

Second, should we preserve the __toString() in src/Illuminate/Database/Query/Expression.php for backwards compatibility?

That's not easily possible. We could do that but then we have two different expression implementations that need to be handled differently. And I guess almost no one really extended the Expression class.

tpetry avatar Oct 31 '22 07:10 tpetry

I can see how this PR seems like a good compromise. It basically expands the Expression class to be able to take a Grammar as dependency in order to get help from the Grammar to generate the underlying DB expression. Personally I don't like any potential syntax that will arise from this due to how one expression needs to wrap another one, but at least that's no longer the Framework's concern and will be handled exclusively outside of the framework. Another thing that I think is extremely edge-case is the need to write a single raw expression compatible with multiple databases. Sure, things like Laravel Nova may theoretically benefit from such a feature, but truth is most applications will target only one database and when doing raw expressions, doing the raw expression of your database will likely yield a better syntax.

In summary, I don't particularly like the code that will be written as a result of this feature, but this feature in itself isn't that much harmful to the framework. I just wonder whether there's a simpler approach to provide the Grammar to the Expression class with less changes, but that requires digging deeper...

deleugpn avatar Nov 02 '22 12:11 deleugpn

Can you provide the breaking changes and the upgrade steps we would need to include in the 10.x upgrade guide if this is merged?

taylorotwell avatar Nov 02 '22 20:11 taylorotwell

Sure.

Breaking Changes

The Illuminate\Database\Query\Expression class now implements the Illuminate\Contracts\Database\Query\Expression contract. An expression can now return any scalar value instead of the former requirement to return strings.

Upgrade Guide

If you use DB::raw or create instances from Illuminate\Database\Query\Expression you don't have to change anything. All changes are transparent to you. But if you do extend the Illuminate\Database\Query\Expression class you have to change your implementation. The SQL code you returned in the __toString method needs to be moved to the getValue method.

tpetry avatar Nov 03 '22 19:11 tpetry

@tpetry It might also be worth noting that because of the removal from __toString() method towards getValue() that any expression cannot be converted to a string using typecasting anymore due to the now missing Grammar context in that method.

I wonder if we could add that back somehow? I am however not even sure if doing (string) new \Illuminate\Database\Query\Expression('some expression') is something we want to support?

morloderex avatar Nov 18 '22 00:11 morloderex

I am not sure whether that counts as a breaking change for users of Laravel, as only the Laravel core is transforming those expressions into strings. I did choose against both ways to get the string for an expression because it increases the complexity of handling those expressions. One way is more straightforward than two ways. And as I said, I don't expect anyone having extended those classes. I only found the idea of doing it in a tweet which I used as an idea to work on this PR.

tpetry avatar Nov 18 '22 07:11 tpetry

@tpetry That is fantastic. What do you think about adding a separate GrammarAwareInterface that contains a single setGrammar method? I think that would allow to keep the original toString method and avoid the breaking change.

xalaida avatar Jan 02 '23 10:01 xalaida

As said before, I found no one really extending the Expression class. Therefore adding two ways to handle them would add a lot more complexity to Eloquent than just doing a breaking change in a major release for something maybe 1 of 1.000 have done.

tpetry avatar Jan 03 '23 12:01 tpetry

I think this look fairly good to me if we can get the conflicts fixed @tpetry 👍

taylorotwell avatar Jan 23 '23 04:01 taylorotwell

@taylorotwell Done. Rebased it on the newest 10.x commit.

tpetry avatar Jan 23 '23 11:01 tpetry

Thanks! 👍

taylorotwell avatar Jan 24 '23 22:01 taylorotwell

@tpetry I'm curious, in what case would an expression ever return anything but a SQL string from getValue? I thought it was supposed to represent a SQL expression.

willrowe avatar Jan 25 '23 01:01 willrowe

An integer or float is still representable as string. And it was completely valid to use them before, so I doesnt want to break this.

tpetry avatar Jan 25 '23 09:01 tpetry

@tpetry I know it allowed those types previously, I was just wondering if there was an example of a situation in which something other than a string would need to be returned.

willrowe avatar Jan 25 '23 13:01 willrowe

Hey @tpetry - I've finally finished my review of this. Thanks for your patience!

I wanted to get some further elaboration on your "Outlook" section regarding adding bindings / quoted values from the expression. What was your thoughts there on how that would work. While this PR definitely provides an improvement over our current expression capabilities, I can see people quickly asking for the ability for their expressions to be able to safely add bindings to the query. Any thoughts on how that would look or be implemented?

taylorotwell avatar Jan 27 '23 21:01 taylorotwell

I did that in my old PR. Basically we have to the grammar. In the future (in my opinion) the query grammar instance should contain a database connection object (PDO). That way it could provide escaping of raw content based on the current connection settings (the charset encoding is important with mysql to prevent sql injections!) or export the version number of the database (to use newer sql features when available).

I could work backport this one tomorrow or tuesday if this merged. And we‘ll check it for 10.x or better plan it for 11.x.

tpetry avatar Jan 27 '23 21:01 tpetry

@tpetry now that I have a pretty decent grasp of this PR we can just try to expose the PDO connection to the grammar with another commit on this PR if you don't mind?

taylorotwell avatar Jan 27 '23 21:01 taylorotwell

I can do that, on tomorrow. If those changes are then too much for you, feel free to just revert the additional commit.

tpetry avatar Jan 27 '23 22:01 tpetry

While thinking about it, I would like to delay that change about linking the grammar with the connection. I don‘t want to break the 10.x release. Let me work on the grammar thing in a backward compatible way within L10. I guess other db providers (like mine, staudenmeier or singlestore) may break with such a change? Its a little bit late for that in the 10.x process. I‘ll find a way shortly after the 10.x release for grammars to opt-in for that feature if they want support it.

tpetry avatar Jan 27 '23 22:01 tpetry

@taylorotwell Sorry, I did write the last message misleading last night. I am ok with merging this PR. Adding the connection to grammars I would like to work on shortly after the 10.x release to do it in a safe backwards-compatible way. I don‘t want to break other non-core drivers so shortly before the final release.

tpetry avatar Jan 28 '23 09:01 tpetry

Was there a reason for not just passing in an extension of Expression that implements your own __toString() method.

https://www.php.net/manual/en/class.stringable.php

I might be missing something?

danrichards avatar Mar 14 '23 01:03 danrichards

@tpetry Why did you not add the classes like Alias in the PR? i would be nice if they were generally available to anyone.

joelharkes avatar Apr 02 '23 05:04 joelharkes