weaviate-python-client icon indicating copy to clipboard operation
weaviate-python-client copied to clipboard

[bug] async client: query list value type mis-match

Open DustinJamesT opened this issue 6 months ago • 5 comments

When querying text with the async client, there is a bug parsing list-type values

it occurs in your __deserialize_non_ref_prop method on the base_executor file. There is code to escape to a version + 1.25 list parser that is breaking (seems like the async client is still returning the version expected in the < 1.25 version)

if value.HasField("list_value"):
            return (
                self.__deserialize_list_value_prop_125(value.list_value)
                if self._connection._weaviate_version.is_at_least(1, 25, 0)
                else self.__deserialize_list_value_prop_123(value.list_value)
            )

If i manually edit the package to always use __deserialize_list_value_prop_123, then the async query works, so I assume the fix here is to update the async client response handlers

for reference, here is a simplified look at how im calling it

self.weaviate_async_client = weaviate.use_async_with_weaviate_cloud(
            cluster_url=os.getenv('WEAVIATE_HIGH_PERFORMANCE_CLUSTER_URL'),  
            auth_credentials=weaviate.auth.AuthApiKey(os.getenv('WEAVIATE_HIGH_PERFORMANCE_API_KEY')), 
            skip_init_checks=True,
            headers={
                    "X-OpenAI-Api-Key": os.getenv('OPENAI_API_KEY'),  
                    "X-Cohere-Api-Key": os.getenv('COHERE_API_KEY'),
                    "X-VoyageAI-Api-Key": os.getenv('VOYAGE_API_KEY')
                }
        )

collection_name = 'Posts'
collection = agent.database.weaviate_async_client.collections.get(collection_name)

await agent.database.weaviate_async_client.connect()

search_query = "decentralized training"
results = await collection.query.near_text(query=search_query, limit=10) # -- error 

python version: 3.11 weaviate version: weaviate-client-4.16.10

DustinJamesT avatar Sep 18 '25 15:09 DustinJamesT

Hi @DustinJamesT, which version is your server running on in WCD? When I try your code using a local server running 1.32.0, I can't replicate the problem 🤔

tsmith023 avatar Sep 19 '25 08:09 tsmith023

Hey @tsmith023 -- thanks for looking into this. I'm still getting the error on my end, so will give you more details on my setup

Weaviate Cloud Version: 1.32.5 Weaviate SDK Version (python): weaviate-client==4.16.10 Python Version: 3.11.9

I put print statements into the __deserialize_list_value_prop_125 function to show the different type formats I'm seeing from the async vs the sync client

For clarity, this is happening only on array fields -- example below is how a text array property is represented differently in the __deserialize_list_value_prop_125 function on my end across the two client types

Sync Client (works)

[weaviate] __deserialize_list_value_prop_125 type <class 'v1.properties_pb2.ListValue'>
[weaviate] __deserialize_list_value_prop_125 text_values {
  values: "Bittensor"
}

Async Client (breaks)

[weaviate] __deserialize_list_value_prop_125 type <class 'v1.properties_pb2.ListValue'>
[weaviate] __deserialize_list_value_prop_125 values {
  string_value: "Bittensor"
}

All I'm changing between the two calls is invoking the sync or the async client (invoking from a jupyter notebook for testing purposes)

Is there something I should be trying differently?

DustinJamesT avatar Sep 19 '25 12:09 DustinJamesT

Wow, I can see what's wrong and it's very interesting!

The issue is that collection = agent.database.weaviate_async_client.collections.get(collection_name) comes before await agent.database.weaviate_async_client.connect() and so the collection object does not have the correct server version in order to do its BC checks and parse the correct gRPC stub

For now, you will need to call connect prior to making the collection object but this is definitely a niche bug that we should fix. Thanks for raising!

tsmith023 avatar Sep 19 '25 13:09 tsmith023

MRE for future development:

@pytest.mark.asyncio
async def test_query_for_list_value(async_collection_factory: AsyncCollectionFactory) -> None:
    client = weaviate.use_async_with_local()
    collection = client.collections.get("TestCollection")
    await client.connect()
    await client.collections.delete("TestCollection")
    await client.collections.create(
        "TestCollection",
        properties=[
            Property(name="name", data_type=DataType.TEXT),
            Property(name="tags", data_type=DataType.TEXT_ARRAY),
        ],
        vectorizer_config=wvc.config.Configure.Vectorizer.text2vec_contextionary(vectorize_collection_name=False),
    )
    await collection.data.insert_many(
        [
            {"name": "John Doe", "tags": ["tag1", "tag2"]},
            {"name": "Jane Doe", "tags": ["tag2", "tag3"]},
        ]
    )

    res = await collection.query.near_text(query="Jane")
    assert len(res.objects) == 2
    assert res.objects[0].properties["name"] == "Jane Doe"
    assert res.objects[1].properties["name"] == "John Doe"
    assert res.objects[0].properties["tags"] == ["tag2", "tag3"]
    assert res.objects[1].properties["tags"] == ["tag1", "tag2"]

throws:

def __deserialize_list_value_prop_125(
        self, value: properties_pb2.ListValue
    ) -> Optional[List[Any]]:
        if value.HasField("bool_values"):
            return list(value.bool_values.values)
        if value.HasField("date_values"):
            return [_datetime_from_weaviate_str(val) for val in value.date_values.values]
        if value.HasField("int_values"):
            return _ByteOps.decode_int64s(value.int_values.values)
        if value.HasField("number_values"):
            return _ByteOps.decode_float64s(value.number_values.values)
        if value.HasField("text_values"):
            return list(value.text_values.values)
        if value.HasField("uuid_values"):
            return [uuid_lib.UUID(val) for val in value.uuid_values.values]
        if value.HasField("object_values"):
            return [
                self.__parse_nonref_properties_result(val) for val in value.object_values.values
            ]
>       _Warnings.unknown_type_encountered(value.WhichOneof("value"))
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^
E       ValueError: Protocol message ListValue has no "value" field.

self       = <weaviate.collections.query._QueryCollectionAsync object at 0x10602ad10>
value      = values {
  string_value: "tag2"
}
values {
  string_value: "tag3"
}

tsmith023 avatar Sep 19 '25 13:09 tsmith023

ahhh interesting -- perfect that's a super easy change. Thank you!

DustinJamesT avatar Sep 19 '25 15:09 DustinJamesT