batch delete (delete_many) fails with unclear error on timeout.
Following the documentation here: https://weaviate.io/developers/weaviate/manage-data/delete
I have implemented a deletion setup as follows:
collection = client.collections.get(collection_name)
end_date = datetime.strptime("2024-04-09T20:16:10Z", "%Y-%m-%dT%H:%M:%SZ")
where_clause = Filter.by_property("scrape_date").less_or_equal(end_date)
collection.data.delete_many(
where=where_clause,
)
Running small queries (seems to be less than ~3000) is fine. But if there are many vectors to delete I receive this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/weaviate/collections/batch/grpc_batch_delete.py", line 33, in batch_delete
res, _ = self._connection.grpc_stub.BatchDelete.with_call(
File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1193, in with_call
return _end_unary_response_blocking(state, call, True, None)
File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
raise _InactiveRpcError(state) # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "unavailable"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-04-29T15:18:19.331980027+00:00", grpc_status:14, grpc_message:"unavailable"}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.10/site-packages/weaviate/collections/data.py", line 162, in delete_many
return self._batch_delete_grpc.batch_delete(
File "/usr/local/lib/python3.10/site-packages/weaviate/collections/batch/grpc_batch_delete.py", line 67, in batch_delete
raise WeaviateDeleteManyError(e.details()) # pyright: ignore
weaviate.exceptions.WeaviateDeleteManyError: Query call with protocol GRPC delete failed with message unavailable.
Contrary to the linked documentation: https://weaviate.io/developers/weaviate/manage-data/delete
There is a configurable [maximum limit (QUERY_MAXIMUM_RESULTS)](https://weaviate.io/developers/weaviate/config-refs/env-vars#general) on the number of objects that can be deleted in a single query (default 10,000). To delete more objects than the limit, re-run the query.
Rerunning the queries does not appear to result in all data being deleted, as the total number of vectors in the collection is unchanged and the query perpetually fails rather than eventually succeeding when (presumably) it would reach the end of the documents.
I've noticed that the timeout seems to be consistently 60 seconds, so I presume that is the reason for the error and that then the batch deletion fails.
It would be helpful if this was clearer in the raised error.