Add orderByPriority Method for Custom Sorting in Query Builder V2
Building upon the work done in PR #52483
Explanation: This commit introduces the orderByPriority method to Laravel's query builder, enabling developers to sort query results based on a custom priority array. For example, you can now order records by specific IDs in a preferred sequence like this:
Model::whereIn('id', [2, 4, 6, 1, 3, 5])->orderByPriority('id', [2, 4, 6, 1, 3, 5])->get();
For MYSQL and Mariadb This method leverages SQL's FIELD() function, allowing developers to easily display results in a user-defined order.
For Postgres, Sqlite and SQL server, the CASE() function was used to achieve the same result.
Compared to the previous PR, I have placed the logic in the corresponding grammar class as suggested to allow for flexibility of implementation depending on the database server.
Impact: This feature greatly enhances the flexibility of data retrieval, particularly when a specific order of items is crucial for the application's logic or user interface. We have to note that if the larger the array of priority, we might introduce performance issues
For reference:
- https://mariadb.com/docs/skysql-dbaas/ref/xpand/functions/FIELD/
- https://dev.mysql.com/doc/refman/8.4/en/string-functions.html#function_field
Please lowercase all the generated SQL: CASE WHEN => case when etc.
Please also add tests for Builder::orderByPriority(), especially the exceptions.
@staudenmeir
Thank you for taking the time to review.
I have applied lowercase for all the generated SQL. I have also added tests for exceptions and test for the function in the builder itself
I noticed that the two types of implementation behave differently with values that aren't part of the $priority array. If you use ->orderByPriority('id', ['c', 'b', 'a']) while the column also contains d and e, these records get treated differently depending on the database:
- MySQL and MariaDB put them first because
FIELD()returns0for unspecified values:'d', 'e', 'c', 'b', 'a' - The
CASE WHENimplementation puts them last because of theELSEpart:'c', 'b', 'a', 'd', 'e'
I think it makes much more sense to put them last, but I don't see an elegant way to achieve this behavior on MySQL/MariaDB.
Not sure if that's an issue. You could argue that orderByPriority() shouldn't be used this way.
Would it be worth using ORDER BY array_position(ARRAY[...], field::text) in Postgres? https://www.crossingtheruby.com/2023/10/12/postgres-custom-sort-order
@staudenmeir I concur with your observation
| database | present | absent | priority | direction | result |
|---|---|---|---|---|---|
| mysql/mariadb | A B C | D | A B C | asc | D A B C |
| mysql/mariadb | A B C | D | A B C | desc | C B A D |
| postgres/sqlite | A B C | D | A B C | asc | D C B A |
| postgres/sqlite | A B C | D | A B C | desc | A B C D |
In order to make this field behavior consistent with the case behavior, it might make sense to change the "else" condition
$caseStatement = 'case '.implode(' ', $cases).' else 0 end';
However, I do agree that it does not feel right that these values come first.
Observation I pondered on whether field was much better than case and by how much.
In a table of 5000 records and a priority array of 1000, CASE executed in 34 ms on average Field instead took 23 ms on average
In the same table of 10000 records and a priority array of 1000, CASE executed in 35 ms on average Field instead took 24 ms on average
As observed, FIELD is better but not by a significant amount. Of course, it might get significant depending on the query. I am leaning on using CASE only so as to have better consistency of behavior across all grammars.
Food for thought I also wondered on whether if it would not be better to let the user decide on how he wants to deal with the values not found in priority by letting the user specify the default value to return in case the value was not found in the priority. This could lead to some interesting applications with this added flexibility.
For example, in the postgres grammar, I would set this instead
public function orderByPriority(string $column, array $priority, string $direction = 'asc', int $defaultIndex = 0)
{
$column = $this->wrap($column);
$cases = [];
foreach ($priority as $index => $value) {
$cases[] = "when {$column} = ? then {$index}";
}
$caseStatement = 'case '.implode(' ', $cases).' else '.$defaultIndex.' end';
return "{$caseStatement} {$direction}";
}
I question if we really need this at the database layer? 😬
$ids = [9, 10, 7, 8, 4, 6, 5, 2, 3, 1];
return User::whereIn('id', $ids)
->get()
->sortBy(fn ($model) => array_search($model->id, $ids))
->values();
Drafting pending further discussion around inconsistencies around database drivers.
This kinda creates horrible queries for any query with over 10 records, but I think it would be good to actually provide a benchmark against a php-side sorting to see if it actually improves speed in a general sense (which I think it does).
Whenever sorting/aggregating/selecting can be done on the database side, it's usually a performance benefit.
@taylorotwell I agree with sorting in php, if we are retrieving all the results. However, in some cases, it might be needed before retrieval such as providing a paginated list. I also think that when we think in terms of ids, it might be less intuitive.
Here is another example of a use case ` $roles = ['moderator', 'admin', 'user'];
return User::orderByPriority('role', $roles) ->paginate(10); `
@bert-w I do agree that it can create horrible queries and if we are retrieving all results before ordering, I feel PHP ordering might have an edge.
However, lets assume i have a table with 1M+ records of users with a role column with the possible values : 'moderator' 'admin' 'user'
I want to display a list of users, with the order above. It might not make sense to load 1M+ records in memory before sorting.
Anyway I'll try to benchmark this and post the results here.
I can vouch for the performance. Using ORDER BY FIELD (…) and then $query->cursor() to iterate over the records is less memory intensive than sorting a collection, especially when you have lots of records to sort and lots of values to sort them by.
I have thought about this and my conclusion would be to use CASE even for mysql. This would provide a consistent output, independent of database driver, for cases where the values is not present in the result set.
The behavior would be:
- Ascending: Values not in list would be at the end
- Descending: Values not in list would be first
Let me know what you think
If agreed, i will move the code back to builder itself instead of the corresponding builder grammar since it is the same implementation.
From my tests, I see that field is indeed faster, but marginally faster. I think consistency is more important.