[bug] async client: query list value type mis-match
When querying text with the async client, there is a bug parsing list-type values
it occurs in your __deserialize_non_ref_prop method on the base_executor file. There is code to escape to a version + 1.25 list parser that is breaking (seems like the async client is still returning the version expected in the < 1.25 version)
if value.HasField("list_value"):
return (
self.__deserialize_list_value_prop_125(value.list_value)
if self._connection._weaviate_version.is_at_least(1, 25, 0)
else self.__deserialize_list_value_prop_123(value.list_value)
)
If i manually edit the package to always use __deserialize_list_value_prop_123, then the async query works, so I assume the fix here is to update the async client response handlers
for reference, here is a simplified look at how im calling it
self.weaviate_async_client = weaviate.use_async_with_weaviate_cloud(
cluster_url=os.getenv('WEAVIATE_HIGH_PERFORMANCE_CLUSTER_URL'),
auth_credentials=weaviate.auth.AuthApiKey(os.getenv('WEAVIATE_HIGH_PERFORMANCE_API_KEY')),
skip_init_checks=True,
headers={
"X-OpenAI-Api-Key": os.getenv('OPENAI_API_KEY'),
"X-Cohere-Api-Key": os.getenv('COHERE_API_KEY'),
"X-VoyageAI-Api-Key": os.getenv('VOYAGE_API_KEY')
}
)
collection_name = 'Posts'
collection = agent.database.weaviate_async_client.collections.get(collection_name)
await agent.database.weaviate_async_client.connect()
search_query = "decentralized training"
results = await collection.query.near_text(query=search_query, limit=10) # -- error
python version: 3.11 weaviate version: weaviate-client-4.16.10
Hi @DustinJamesT, which version is your server running on in WCD? When I try your code using a local server running 1.32.0, I can't replicate the problem 🤔
Hey @tsmith023 -- thanks for looking into this. I'm still getting the error on my end, so will give you more details on my setup
Weaviate Cloud Version: 1.32.5
Weaviate SDK Version (python): weaviate-client==4.16.10
Python Version: 3.11.9
I put print statements into the __deserialize_list_value_prop_125 function to show the different type formats I'm seeing from the async vs the sync client
For clarity, this is happening only on array fields -- example below is how a text array property is represented differently in the __deserialize_list_value_prop_125 function on my end across the two client types
Sync Client (works)
[weaviate] __deserialize_list_value_prop_125 type <class 'v1.properties_pb2.ListValue'>
[weaviate] __deserialize_list_value_prop_125 text_values {
values: "Bittensor"
}
Async Client (breaks)
[weaviate] __deserialize_list_value_prop_125 type <class 'v1.properties_pb2.ListValue'>
[weaviate] __deserialize_list_value_prop_125 values {
string_value: "Bittensor"
}
All I'm changing between the two calls is invoking the sync or the async client (invoking from a jupyter notebook for testing purposes)
Is there something I should be trying differently?
Wow, I can see what's wrong and it's very interesting!
The issue is that collection = agent.database.weaviate_async_client.collections.get(collection_name) comes before await agent.database.weaviate_async_client.connect() and so the collection object does not have the correct server version in order to do its BC checks and parse the correct gRPC stub
For now, you will need to call connect prior to making the collection object but this is definitely a niche bug that we should fix. Thanks for raising!
MRE for future development:
@pytest.mark.asyncio
async def test_query_for_list_value(async_collection_factory: AsyncCollectionFactory) -> None:
client = weaviate.use_async_with_local()
collection = client.collections.get("TestCollection")
await client.connect()
await client.collections.delete("TestCollection")
await client.collections.create(
"TestCollection",
properties=[
Property(name="name", data_type=DataType.TEXT),
Property(name="tags", data_type=DataType.TEXT_ARRAY),
],
vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_contextionary(vectorize_collection_name=False),
)
await collection.data.insert_many(
[
{"name": "John Doe", "tags": ["tag1", "tag2"]},
{"name": "Jane Doe", "tags": ["tag2", "tag3"]},
]
)
res = await collection.query.near_text(query="Jane")
assert len(res.objects) == 2
assert res.objects[0].properties["name"] == "Jane Doe"
assert res.objects[1].properties["name"] == "John Doe"
assert res.objects[0].properties["tags"] == ["tag2", "tag3"]
assert res.objects[1].properties["tags"] == ["tag1", "tag2"]
throws:
def __deserialize_list_value_prop_125(
self, value: properties_pb2.ListValue
) -> Optional[List[Any]]:
if value.HasField("bool_values"):
return list(value.bool_values.values)
if value.HasField("date_values"):
return [_datetime_from_weaviate_str(val) for val in value.date_values.values]
if value.HasField("int_values"):
return _ByteOps.decode_int64s(value.int_values.values)
if value.HasField("number_values"):
return _ByteOps.decode_float64s(value.number_values.values)
if value.HasField("text_values"):
return list(value.text_values.values)
if value.HasField("uuid_values"):
return [uuid_lib.UUID(val) for val in value.uuid_values.values]
if value.HasField("object_values"):
return [
self.__parse_nonref_properties_result(val) for val in value.object_values.values
]
> _Warnings.unknown_type_encountered(value.WhichOneof("value"))
^^^^^^^^^^^^^^^^^^^^^^^^^
E ValueError: Protocol message ListValue has no "value" field.
self = <weaviate.collections.query._QueryCollectionAsync object at 0x10602ad10>
value = values {
string_value: "tag2"
}
values {
string_value: "tag3"
}
ahhh interesting -- perfect that's a super easy change. Thank you!