framework icon indicating copy to clipboard operation
framework copied to clipboard

Add orderByPriority Method for Custom Sorting in Query Builder V2

Open Mushood opened this issue 1 year ago • 13 comments

Building upon the work done in PR #52483

Explanation: This commit introduces the orderByPriority method to Laravel's query builder, enabling developers to sort query results based on a custom priority array. For example, you can now order records by specific IDs in a preferred sequence like this:

Model::whereIn('id', [2, 4, 6, 1, 3, 5])->orderByPriority('id', [2, 4, 6, 1, 3, 5])->get();

For MYSQL and Mariadb This method leverages SQL's FIELD() function, allowing developers to easily display results in a user-defined order.

For Postgres, Sqlite and SQL server, the CASE() function was used to achieve the same result.

Compared to the previous PR, I have placed the logic in the corresponding grammar class as suggested to allow for flexibility of implementation depending on the database server.

Impact: This feature greatly enhances the flexibility of data retrieval, particularly when a specific order of items is crucial for the application's logic or user interface. We have to note that if the larger the array of priority, we might introduce performance issues

For reference:

  • https://mariadb.com/docs/skysql-dbaas/ref/xpand/functions/FIELD/
  • https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_field

Mushood avatar Aug 20 '24 16:08 Mushood

Please lowercase all the generated SQL: CASE WHEN => case when etc.

Please also add tests for Builder::orderByPriority(), especially the exceptions.

staudenmeir avatar Aug 20 '24 21:08 staudenmeir

@staudenmeir

Thank you for taking the time to review.

I have applied lowercase for all the generated SQL. I have also added tests for exceptions and test for the function in the builder itself

Mushood avatar Aug 21 '24 04:08 Mushood

I noticed that the two types of implementation behave differently with values that aren't part of the $priority array. If you use ->orderByPriority('id', ['c', 'b', 'a']) while the column also contains d and e, these records get treated differently depending on the database:

  • MySQL and MariaDB put them first because FIELD() returns 0 for unspecified values: 'd', 'e', 'c', 'b', 'a'
  • The CASE WHEN implementation puts them last because of the ELSE part: 'c', 'b', 'a', 'd', 'e'

I think it makes much more sense to put them last, but I don't see an elegant way to achieve this behavior on MySQL/MariaDB.

Not sure if that's an issue. You could argue that orderByPriority() shouldn't be used this way.

staudenmeir avatar Aug 21 '24 09:08 staudenmeir

Would it be worth using ORDER BY array_position(ARRAY[...], field::text) in Postgres? https://www.crossingtheruby.com/2023/10/12/postgres-custom-sort-order

ziadoz avatar Aug 21 '24 10:08 ziadoz

@staudenmeir I concur with your observation

database present absent priority direction result
mysql/mariadb A B C D A B C asc D A B C
mysql/mariadb A B C D A B C desc C B A D
           
postgres/sqlite A B C D A B C asc D C B A
postgres/sqlite A B C D A B C desc A B C D

In order to make this field behavior consistent with the case behavior, it might make sense to change the "else" condition $caseStatement = 'case '.implode(' ', $cases).' else 0 end';

However, I do agree that it does not feel right that these values come first.

Mushood avatar Aug 21 '24 18:08 Mushood

Observation I pondered on whether field was much better than case and by how much.

In a table of 5000 records and a priority array of 1000, CASE executed in 34 ms on average Field instead took 23 ms on average

In the same table of 10000 records and a priority array of 1000, CASE executed in 35 ms on average Field instead took 24 ms on average

As observed, FIELD is better but not by a significant amount. Of course, it might get significant depending on the query. I am leaning on using CASE only so as to have better consistency of behavior across all grammars.

Food for thought I also wondered on whether if it would not be better to let the user decide on how he wants to deal with the values not found in priority by letting the user specify the default value to return in case the value was not found in the priority. This could lead to some interesting applications with this added flexibility.

For example, in the postgres grammar, I would set this instead

public function orderByPriority(string $column, array $priority, string $direction = 'asc', int $defaultIndex = 0)
{
    $column = $this->wrap($column);

    $cases = [];

    foreach ($priority as $index => $value) {
        $cases[] = "when {$column} = ? then {$index}";
    }

    $caseStatement = 'case '.implode(' ', $cases).' else '.$defaultIndex.' end';

    return "{$caseStatement} {$direction}";
}

Mushood avatar Aug 21 '24 19:08 Mushood

I question if we really need this at the database layer? 😬

$ids = [9, 10, 7, 8, 4, 6, 5, 2, 3, 1];

return User::whereIn('id', $ids)
    ->get()
    ->sortBy(fn ($model) => array_search($model->id, $ids))
    ->values();

taylorotwell avatar Aug 22 '24 13:08 taylorotwell

Drafting pending further discussion around inconsistencies around database drivers.

taylorotwell avatar Aug 22 '24 18:08 taylorotwell

This kinda creates horrible queries for any query with over 10 records, but I think it would be good to actually provide a benchmark against a php-side sorting to see if it actually improves speed in a general sense (which I think it does).

Whenever sorting/aggregating/selecting can be done on the database side, it's usually a performance benefit.

bert-w avatar Aug 22 '24 22:08 bert-w

@taylorotwell I agree with sorting in php, if we are retrieving all the results. However, in some cases, it might be needed before retrieval such as providing a paginated list. I also think that when we think in terms of ids, it might be less intuitive.

Here is another example of a use case ` $roles = ['moderator', 'admin', 'user'];

return User::orderByPriority('role', $roles) ->paginate(10); `

Mushood avatar Aug 23 '24 04:08 Mushood

@bert-w I do agree that it can create horrible queries and if we are retrieving all results before ordering, I feel PHP ordering might have an edge.

However, lets assume i have a table with 1M+ records of users with a role column with the possible values : 'moderator' 'admin' 'user'

I want to display a list of users, with the order above. It might not make sense to load 1M+ records in memory before sorting.

Anyway I'll try to benchmark this and post the results here.

Mushood avatar Aug 23 '24 04:08 Mushood

I can vouch for the performance. Using ORDER BY FIELD (…) and then $query->cursor() to iterate over the records is less memory intensive than sorting a collection, especially when you have lots of records to sort and lots of values to sort them by.

ziadoz avatar Aug 27 '24 08:08 ziadoz

I have thought about this and my conclusion would be to use CASE even for mysql. This would provide a consistent output, independent of database driver, for cases where the values is not present in the result set.

The behavior would be:

  1. Ascending: Values not in list would be at the end
  2. Descending: Values not in list would be first

Let me know what you think

If agreed, i will move the code back to builder itself instead of the corresponding builder grammar since it is the same implementation.

From my tests, I see that field is indeed faster, but marginally faster. I think consistency is more important.

image

image

Mushood avatar Aug 30 '24 05:08 Mushood