qdrant Distributed deployment cluster with a single dead shard fails to respond to queries

Running a distributed deployment of qdrant on kubernetes, single collection with replication factor set to 2. One shard failed and the cluster fails to respond to a query. I thought the loss of a single shard in this configuration shouldn't be a problem.

Current Behavior

The cluster fails to run a query

[38;20m2024-02-11 17:28:39 INFO Received input: session_id=62049 query='xxxx xxxxx xxxxx' team_id=1866 file_id=[3074553] sitemap_id=None env_name='xxxxxx'[0m
INFO:     100.100.29.236:39890 - "POST /qdrant/conversation HTTP/1.1" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  
  File "/home/searchie/.local/lib/python3.10/site-packages/qdrant_client/http/api_client.py", line 97, in send
    raise UnexpectedResponse.for_response(response)
qdrant_client.http.exceptions.UnexpectedResponse: Unexpected Response: 500 (Internal Server Error)
Raw response content:
b'{"status":{"error":"Service internal error: The replica set for shard 3 on peer 6358360577973509 does not have enough active replicas"},"time":0.000686151}'

Steps to Reproduce

Have one shard of a distributed deployment cluster, with replication factor = 2, fail
Run a query

Expected Behavior

Qdrant continues to index new vectors and respond to search requests.

Possible Solution

Context (Environment)

3 node cluster of qdrant 1.7.3 on kubernetes installed with helm chart qdrant-0.7.5. Each node is aws ec2 r6a.2xlarge with 8vCPU and 64 GB RAM, 100 GB gp3 EBS volume.

{
  "result": {
    "status": "enabled",
    "peer_id": 3141207761334255,
    "peers": {
      "6358360577973509": {
        "uri": "http://qdrant-1.qdrant-headless:6335/"
      },
      "419724648802618": {
        "uri": "http://qdrant-2.qdrant-headless:6335/"
      },
      "3141207761334255": {
        "uri": "http://qdrant-0.qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 364,
      "commit": 14441,
      "pending_operations": 0,
      "leader": 6358360577973509,
      "role": "Follower",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-02-12T12:40:42.218687606Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.000006651
}

{
  "result": {
    "status": "green",
    "optimizer_status": "ok",
    "vectors_count": 14122727,
    "indexed_vectors_count": 14105677,
    "points_count": 14122470,
    "segments_count": 48,
    "config": {
      "params": {
        "vectors": {
          "size": 768,
          "distance": "Cosine",
          "on_disk": true
        },
        "shard_number": 6,
        "replication_factor": 2,
        "write_consistency_factor": 1,
        "on_disk_payload": true
      },
      "hnsw_config": {
        "m": 16,
        "ef_construct": 100,
        "full_scan_threshold": 10000,
        "max_indexing_threads": 0,
        "on_disk": true
      },
      "optimizer_config": {
        "deleted_threshold": 0.2,
        "vacuum_min_vector_number": 1000,
        "default_segment_number": 0,
        "max_segment_size": null,
        "memmap_threshold": null,
        "indexing_threshold": 20000,
        "flush_interval_sec": 5,
        "max_optimization_threads": 1
      },
      "wal_config": {
        "wal_capacity_mb": 32,
        "wal_segments_ahead": 0
      },
      "quantization_config": null
    },
    "payload_schema": {
      "metadata.model_id": {
        "data_type": "integer",
        "points": 14122470
      },
      "metadata.team_id": {
        "data_type": "integer",
        "points": 14122470
      }
    }
  },
  "status": "ok",
  "time": 0.001136403
}

{
  "result": {
    "peer_id": 6358360577973509,
    "shard_count": 6,
    "local_shards": [
      {
        "shard_id": 0,
        "points_count": 2341907,
        "state": "Active"
      },
      {
        "shard_id": 2,
        "points_count": 1944223,
        "state": "Active"
      },
      {
        "shard_id": 3,
        "points_count": 2434372,
        "state": "Dead"
      },
      {
        "shard_id": 5,
        "points_count": 2756990,
        "state": "Active"
      }
    ],
    "remote_shards": [
      {
        "shard_id": 0,
        "peer_id": 419724648802618,
        "state": "Active"
      },
      {
        "shard_id": 1,
        "peer_id": 3141207761334255,
        "state": "Active"
      },
      {
        "shard_id": 1,
        "peer_id": 419724648802618,
        "state": "Active"
      },
      {
        "shard_id": 2,
        "peer_id": 3141207761334255,
        "state": "Active"
      },
      {
        "shard_id": 3,
        "peer_id": 419724648802618,
        "state": "Active"
      },
      {
        "shard_id": 4,
        "peer_id": 3141207761334255,
        "state": "Active"
      },
      {
        "shard_id": 4,
        "peer_id": 419724648802618,
        "state": "Active"
      },
      {
        "shard_id": 5,
        "peer_id": 3141207761334255,
        "state": "Active"
      }
    ],
    "shard_transfers": []
  },
  "status": "ok",
  "time": 0.000043171
}

The cluster writes log like this every 10 seconds

qdrant-1 qdrant 2024-02-12T13:18:49.889916Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Wrong input: Cannot deactivate the last active replica 419724648802618 of shard 3    
qdrant-2 qdrant 2024-02-12T13:18:49.894157Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Wrong input: Cannot deactivate the last active replica 419724648802618 of shard 3    
qdrant-0 qdrant 2024-02-12T13:18:49.894543Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Wrong input: Cannot deactivate the last active replica 419724648802618 of shard 3

Detailed Description

Feb 12 '24 13:02 azhelev

Hey @azhelev, could you please check if your cluster have consistent state? You need to make sure that

    "raft_info": {
      "term": 364,
      "commit": 14441,

are same on all nodes.

Feb 12 '24 14:02 generall

Hi @generall the state looks fine

{
  "result": {
    "status": "enabled",
    "peer_id": 3141207761334255,
    "peers": {
      "419724648802618": {
        "uri": "http://qdrant-2.qdrant-headless:6335/"
      },
      "3141207761334255": {
        "uri": "http://qdrant-0.qdrant-headless:6335/"
      },
      "6358360577973509": {
        "uri": "http://qdrant-1.qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 364,
      "commit": 15599,
      "pending_operations": 0,
      "leader": 6358360577973509,
      "role": "Follower",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-02-12T15:53:53.509658301Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.00000703
}

{
  "result": {
    "status": "enabled",
    "peer_id": 6358360577973509,
    "peers": {
      "419724648802618": {
        "uri": "http://qdrant-2.qdrant-headless:6335/"
      },
      "3141207761334255": {
        "uri": "http://qdrant-0.qdrant-headless:6335/"
      },
      "6358360577973509": {
        "uri": "http://qdrant-1.qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 364,
      "commit": 15600,
      "pending_operations": 0,
      "leader": 6358360577973509,
      "role": "Leader",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-02-12T15:54:08.275015449Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.00000626
}

{
  "result": {
    "status": "enabled",
    "peer_id": 419724648802618,
    "peers": {
      "3141207761334255": {
        "uri": "http://qdrant-0.qdrant-headless:6335/"
      },
      "419724648802618": {
        "uri": "http://qdrant-2.qdrant-headless:6335/"
      },
      "6358360577973509": {
        "uri": "http://qdrant-1.qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 364,
      "commit": 15601,
      "pending_operations": 0,
      "leader": 6358360577973509,
      "role": "Follower",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-02-12T15:54:16.767143549Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.00001385
}

Feb 12 '24 15:02 azhelev

which peer is down? I assume 419724648802618

Feb 12 '24 21:02 generall

From what i understand no peer is down, they are all up, GET /readyz endpoint responds with 200 OK on all of them. Only shard id 3 on 6358360577973509 is marked as Dead.

Feb 13 '24 05:02 azhelev

hm, I don't see a reason why request would fail in this configuration. Also interesting that shard recovery is not initialized

Feb 13 '24 20:02 generall

Same here: GET /cluster

{
  "result": {
    "status": "enabled",
    "peer_id": 399284531266390,
    "peers": {
      "3623990355959938": {
        "uri": "http://urlslab-qdrant-1.urlslab-qdrant-headless:6335/"
      },
      "3536642733441919": {
        "uri": "http://urlslab-qdrant-2.urlslab-qdrant-headless:6335/"
      },
      "399284531266390": {
        "uri": "http://urlslab-qdrant-0.urlslab-qdrant-headless:6335/"
      }
    },
    "raft_info": {
      "term": 29,
      "commit": 48609,
      "pending_operations": 0,
      "leader": 399284531266390,
      "role": "Leader",
      "is_voter": true
    },
    "consensus_thread_status": {
      "consensus_thread_status": "working",
      "last_update": "2024-02-18T07:58:01.971770346Z"
    },
    "message_send_failures": {}
  },
  "status": "ok",
  "time": 0.00001001
}

Qdrant Server logs:

qdrant 2024-02-18T07:59:37.044781Z  WARN storage::content_manager::consensus_manager: Failed to apply collection meta operation entry with user error: Wrong input: Cannot deactivate the last active replica 3536642733441919 of shard 18

in Query time:

Service internal error: The replica set for shard 10 on peer 399284531266390 has no active replica

Feb 18 '24 08:02 yasha-dev1

@generall , I think a part of the issue lies in the fact that the shards don't get restarted correctly when they are dead. When I restarted all qdrant nodes, everything went back to normal again. but then, the same issue happens after a while... there could be a bug in shard recovery

Feb 19 '24 06:02 yasha-dev1

could you please describe a scenario when this happened?

Feb 19 '24 09:02 generall

We also experience this bug using a distributed qdrant deployment. When we upload a collection from a snapshot, the collection exists on all nodes but does only contain points on the node that processed the upload

May 10 '24 15:05 tZimmermann98

When we upload a collection from a snapshot, the collection exists on all nodes but does only contain points on the node that processed the upload

Please make sure you are following the steps of the tutorial correctly - https://qdrant.tech/documentation/tutorials/create-snapshot/

Especially, that you are using ?priority=snapshot parameter on recovery

May 10 '24 18:05 generall

Are there any Plans to implement a way qdrant keeps aware of distributing data over the nodes itself, so you have to upload a collection only once to a dashboard or single API endpoint? We are deploying QDrant in a Kubernetes cluster, the dashboard is behind a Service that has routes to the Endpoints of the pods, so we dont have Control over the traffic routing - we can communicate with the single pods inside the cluster, but not from outside. Would be nice if qdrant is able to handle that, just like distributed database systems like Stolon.

May 17 '24 10:05 JWandscheer

qdrant qdrant copied to clipboard

Distributed deployment cluster with a single dead shard fails to respond to queries

Current Behavior

Steps to Reproduce

Expected Behavior

Possible Solution

Context (Environment)

Detailed Description

qdrant
qdrant copied to clipboard