laravel-mongodb icon indicating copy to clipboard operation
laravel-mongodb copied to clipboard

Performance Regression in Count Query with Indexed Field

Open davidvandertuijn opened this issue 1 year ago • 6 comments

Description:

I am experiencing a significant performance regression when counting documents in the reefer_measurements collection using the reefer_id index.

Index created on reefer_id:

db.reefer_measurements.createIndex({ reefer_id: 1 });

When executing the following query to count documents:

$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();

I observed the following performance times:

Version Time
Version 4.8.0 3,869.62ms
Version 5.0.0 22,550.24ms

Environment:

  • Laravel-mongodb Version: 5.0.0
  • PHP Version: 8.2
  • Laravel Version: Laravel: 11.29.0
  • MongoDB extension version => 1.20.0
  • Collection: reefer_measurements
  • Number of documents: 4 million

Steps to reproduce

  1. Create a collection reefer_measurements with 4 million documents;
  2. Create an index on reefer_id;
  3. Execute the count query as demonstrated above;
  4. Compare the performance between versions 4.8.0 and 5.0.0.

Expected behaviour

The count operation should be optimized and utilize the index effectively, resulting in a much lower execution time.

Actual behaviour

The execution time has increased dramatically from version 4.8.0 to 5.0.0, leading to degraded performance in applications relying on this count query.

I would appreciate any guidance on potential causes for this performance regression and suggestions for optimization.

davidvandertuijn avatar Nov 05 '24 08:11 davidvandertuijn

Could you enable logs and report the query that is executed on both package versions?

DB::enableQueryLog();

// The query

dd(DB::getQueryLog());

GromNaN avatar Nov 05 '24 08:11 GromNaN

@GromNaN

DB::connection('mongodb')->enableQueryLog();
$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();
dd(DB::connection('mongodb')->getQueryLog());

Version 4.8.0

[
  "query" => "reefer_measurements.countDocuments({"reefer_id":303872},[])"
  "bindings" => []
  "time" => 1514.84
]

Version 5.0.0


[
    "query" => "{ "aggregate" : "measurements", "pipeline" : [ { "$match" : { "reefer_id" : { "$numberInt" : "303872" } } }, { "$group" : { "_id" : { "$numberInt" : "1" }, "n" : { "$sum" : { "$numberInt" : "1" } } } } ], "cursor" : {  } }"
    "bindings" => []
    "time" => 22677580
]

davidvandertuijn avatar Nov 05 '24 09:11 davidvandertuijn

Thanks. Both commands are identical (in 4.8, you see the high level API, and in 5.0 you see the actual server command). So the performance difference is due to something in the code.

It there the same time difference when you disable query logs?

GromNaN avatar Nov 05 '24 11:11 GromNaN

@GromNaN I used minimal code with this profiler to see the time difference. Using getQueryLog shows approximately the same result, but there’s still a significant difference between 4.8.0 and 5.0.0.

$start = microtime(true);
$db = ReeferMeasurementModel::where('reefer_id', '=', $reefer->id);
$total = $db->count();
$stop = microtime(true);
$time = ($stop - $start) * 1000;
echo number_format($time, 2).'ms';

davidvandertuijn avatar Nov 05 '24 11:11 davidvandertuijn

I can't reproduce the time difference with a collection of the same size.

GromNaN avatar Nov 05 '24 12:11 GromNaN

The MongoDB on the server is still running on version v4.2.25. We’ll be upgrading incrementally, and I’ll keep you updated on the results once we reach the latest release.

davidvandertuijn avatar Nov 08 '24 07:11 davidvandertuijn