Inconsistent execution statistics for Mango queries

Open pgj opened this issue 2 years ago • 0 comments

When requested, the Mango execution statistics do not always reflect their actual values but something that is less than or simply zero, most of the times. This happens on the latest version of CouchDB (as the time of writing, which is 82aa1625) and the issue can be reproduced as follows. The commands below create a simple database, called test with 25 documents of a single field, for which an index is defined.

curl -sS -X PUT "$COUCHDB_URL"/test
for i in $(jot - 1 25); do \
  curl -sS -X POST "$COUCHDB_URL"/test -H "Content-Type: application/json" -d '{"a": '"$i"'}'; done
curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_index \
  -d '{"index": {"fields": ["a"]}, "name": "a", "type": "json"}'

Then, using a selector like {"a": {"$lt": 20}} and limit of 1, and with execution statistics enabled, the total_docs_examined field becomes zero. The other _examined fields are either not implemented or not affected by queries like that. It is visible that results_returned is properly accounted at the same time.

$ curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_find \
  -d '{"execution_stats": true, "limit": 1, "selector": {"a": {"$lt": 20}}}' \
  | jq '.execution_stats'
{
  "total_keys_examined": 0,
  "total_docs_examined": 0,
  "total_quorum_docs_examined": 0,
  "results_returned": 1,
  "execution_time_ms": 1.067
}

Note that if either limit is increased or the search criteria on the a field is changed to find less documents, more realistic data is returned.

$ curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_find \
  -d '{"execution_stats": true, "limit": 20, "selector": {"a": {"$lt": 20}}}' \
  | jq '.execution_stats'
{
  "total_keys_examined": 0,
  "total_docs_examined": 20,
  "total_quorum_docs_examined": 0,
  "results_returned": 19,
  "execution_time_ms": 2.378
}

$ curl -sS -X POST -H "Content-Type: application/json" "$COUCHDB_URL"/test/_find \
  -d '{"execution_stats": true, "limit": 1, "selector": {"a": {"$lt": 2}}}' \
  | jq '.execution_stats'
{
  "total_keys_examined": 0,
  "total_docs_examined": 2,
  "total_quorum_docs_examined": 0,
  "results_returned": 1,
  "execution_time_ms": 0.845
}

After some debugging, the source of the issue has been identified as the emission of stop in mango_cursor_view:handle_doc/2 when the limit reaches zero (last clause).

https://github.com/apache/couchdb/blob/82aa1625ca8b652f7db3dadad741e86239641921/src/mango/src/mango_cursor_view.erl#L483-L497

The stop action immediately stops the processing of messages from the shards, including the shard-level statistics that are submitted in response to the complete message.

https://github.com/apache/couchdb/blob/82aa1625ca8b652f7db3dadad741e86239641921/src/mango/src/mango_cursor_view.erl#L385-L390

That is why when the limit is too low, there is not enough time to receive and process the related message. Therefore the execution statistics are not, or only partially (from a set of shards) handled.

I could not yet come up with a satisfying solution but I create this ticket to raise awareness about this bug. Changing stop to ok in mango_cursor_view:handle_doc/2 helps with the consistency but then too many results are sent by the shards which will be known to be discarded already. This potentially has an impact on the performance, although it has not been measured how much.

May 01 '23 23:05 pgj