Array-based filter path.[*] returns empty data
This issue was reported by Debdeep Das on https://stackoverflow.com/questions/69167045/.
For document:
{
"id": "1234",
"vowels": "aeu",
"alpha":[
"xyz",
"efgh"
],
"new": [
{"name":"nam1"},
{"name":"nam2"}
]
}
The filter where={"alpha.[*]":{"$eq":"xyz"}} returns no data.
I can confirm this is not-supported currently, the wildcards * and [*] work as expected if they are not at the end of the search path. For example, x.[*].z is valid..
~We will create a fix asap and it will be included in the one of the following releases..~
There could be some serious indexing implications that might not make this actually possible. We count to have a leaf name (or array index) that's being searched for in order to utilize the indexes. If this is missing, we don't have any path related index to use and this can result in failing back to the value index. However, since value index matches different paths, it would not be as effective.
I will check with my team and come back with more information.
Hi there, any news about this issue? Is that something only related to the Document API? Would it work with REST or GraphQL?
Thanks in advance!
We still have not put this on the roadmap, due to the implications it has. If we implement this, these queries will be slower, as we can not utilize the index for the leaf name and would only have value index. But that matches all paths in all documents, thus the path matching would be done in memory. It's gonna be expensive operation, and this is why we did not prioritize this at the moment.
I am not sure what REST and GraphQL offer as the filtering possibilities, but I guess this kind of search is only possible in Document API. That said, note that Documents API uses it's own way to store the JSON in the Cassandra table, and this has nothing to do with how REST and GraphQL work.
@mindflayer If you say that this is a feature that is very important to you and stops you from progressing further I can revisit this and try to see if we can prioritize it asap.
If you say that this is a feature that is very important to you and stops you from progressing further
Many thanks for asking, I really appreciate it. The first use case we encountered has been redesigned using a map instead of an array of values, after reading this issue. With the same approach, of course, we could solve different problems. I would still consider investing time in it if you think it could be solved in some way. What is nice in a Document DB is the fact that you don't need big transformations from the way you store data to the one consumed by your clients. I am not a Cassandra expert, and I imagine it would probably be different, but you may have a look at the way Elasticsearch deals with the same problem. By the way, I read this and it really helped to understand the way you organize the data under the hood: https://stargate.io/2020/10/19/the-stargate-cassandra-documents-api.html
@mindflayer The Elasticsearch is not storing the data as we are. We are having a Cassandra as a data store, thus storing JSON is way more challenging. This implies that queries the data and supporting all the standard filters and aggregations is sometimes not as easy, or simply comes with an extra cost.
Cannot you transform it to a list of maps?
{
"tags": ["foo", "bar"]
}
Could be internally stored the same way you'd store this:
[{"tags": "foo"}, {"tags": "bar"}]
I guess this way the query would be: where={"[*].tags":{"$eq":"foo"}} instead of where={"tags.[*]":{"$eq":"foo"}}.
@mindflayer Not really.. Have a look here -> https://www.datastax.com/blog/2021/04/stargate-documents-api-storage-mechanisms-search-filters-and-performance-improvements on how the data is stored in the C table.
Checking to see if there is any movement on this issue. I cannot map or transform the JSON data that I'm storing. However, I need the ability to search the leaf nodes of the array.