weaviate-python-client icon indicating copy to clipboard operation
weaviate-python-client copied to clipboard

batch delete (delete_many) fails with unclear error on timeout.

Open david-moonsift opened this issue 2 years ago • 0 comments

Following the documentation here: https://weaviate.io/developers/weaviate/manage-data/delete

I have implemented a deletion setup as follows:

collection = client.collections.get(collection_name)
end_date = datetime.strptime("2024-04-09T20:16:10Z", "%Y-%m-%dT%H:%M:%SZ")
where_clause =  Filter.by_property("scrape_date").less_or_equal(end_date)
collection.data.delete_many(
 where=where_clause,
)

Running small queries (seems to be less than ~3000) is fine. But if there are many vectors to delete I receive this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/weaviate/collections/batch/grpc_batch_delete.py", line 33, in batch_delete
    res, _ = self._connection.grpc_stub.BatchDelete.with_call(
  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1193, in with_call
    return _end_unary_response_blocking(state, call, True, None)
  File "/usr/local/lib/python3.10/site-packages/grpc/_channel.py", line 1005, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "unavailable"
	debug_error_string = "UNKNOWN:Error received from peer  {created_time:"2024-04-29T15:18:19.331980027+00:00", grpc_status:14, grpc_message:"unavailable"}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/site-packages/weaviate/collections/data.py", line 162, in delete_many
    return self._batch_delete_grpc.batch_delete(
  File "/usr/local/lib/python3.10/site-packages/weaviate/collections/batch/grpc_batch_delete.py", line 67, in batch_delete
    raise WeaviateDeleteManyError(e.details())  # pyright: ignore
weaviate.exceptions.WeaviateDeleteManyError: Query call with protocol GRPC delete failed with message unavailable.

Contrary to the linked documentation: https://weaviate.io/developers/weaviate/manage-data/delete

There is a configurable [maximum limit (QUERY_MAXIMUM_RESULTS)](https://weaviate.io/developers/weaviate/config-refs/env-vars#general) on the number of objects that can be deleted in a single query (default 10,000). To delete more objects than the limit, re-run the query.

Rerunning the queries does not appear to result in all data being deleted, as the total number of vectors in the collection is unchanged and the query perpetually fails rather than eventually succeeding when (presumably) it would reach the end of the documents.

I've noticed that the timeout seems to be consistently 60 seconds, so I presume that is the reason for the error and that then the batch deletion fails.

It would be helpful if this was clearer in the raised error.

david-moonsift avatar Apr 29 '24 15:04 david-moonsift