Performance Regression in Count Query with Indexed Field
Description:
I am experiencing a significant performance regression when counting documents in the reefer_measurements collection using the reefer_id index.
Index created on reefer_id:
db.reefer_measurements.createIndex({ reefer_id: 1 });
When executing the following query to count documents:
$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();
I observed the following performance times:
| Version | Time |
|---|---|
| Version 4.8.0 | 3,869.62ms |
| Version 5.0.0 | 22,550.24ms |
Environment:
- Laravel-mongodb Version: 5.0.0
- PHP Version: 8.2
- Laravel Version: Laravel: 11.29.0
- MongoDB extension version => 1.20.0
- Collection: reefer_measurements
- Number of documents: 4 million
Steps to reproduce
- Create a collection reefer_measurements with 4 million documents;
- Create an index on reefer_id;
- Execute the count query as demonstrated above;
- Compare the performance between versions 4.8.0 and 5.0.0.
Expected behaviour
The count operation should be optimized and utilize the index effectively, resulting in a much lower execution time.
Actual behaviour
The execution time has increased dramatically from version 4.8.0 to 5.0.0, leading to degraded performance in applications relying on this count query.
I would appreciate any guidance on potential causes for this performance regression and suggestions for optimization.
Could you enable logs and report the query that is executed on both package versions?
DB::enableQueryLog();
// The query
dd(DB::getQueryLog());
@GromNaN
DB::connection('mongodb')->enableQueryLog();
$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();
dd(DB::connection('mongodb')->getQueryLog());
Version 4.8.0
[
"query" => "reefer_measurements.countDocuments({"reefer_id":303872},[])"
"bindings" => []
"time" => 1514.84
]
Version 5.0.0
[
"query" => "{ "aggregate" : "measurements", "pipeline" : [ { "$match" : { "reefer_id" : { "$numberInt" : "303872" } } }, { "$group" : { "_id" : { "$numberInt" : "1" }, "n" : { "$sum" : { "$numberInt" : "1" } } } } ], "cursor" : { } }"
"bindings" => []
"time" => 22677580
]
Thanks. Both commands are identical (in 4.8, you see the high level API, and in 5.0 you see the actual server command). So the performance difference is due to something in the code.
It there the same time difference when you disable query logs?
@GromNaN I used minimal code with this profiler to see the time difference. Using getQueryLog shows approximately the same result, but there’s still a significant difference between 4.8.0 and 5.0.0.
$start = microtime(true);
$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();
$stop = microtime(true);
$time = ($stop - $start) * 1000;
echo number_format($time, 2).'ms';
I can't reproduce the time difference with a collection of the same size.
The MongoDB on the server is still running on version v4.2.25. We’ll be upgrading incrementally, and I’ll keep you updated on the results once we reach the latest release.