stargate icon indicating copy to clipboard operation
stargate copied to clipboard

Array-based filter path.[*] returns empty data

Open ErickRamirezDS opened this issue 4 years ago • 9 comments

This issue was reported by Debdeep Das on https://stackoverflow.com/questions/69167045/.

For document:

{
    "id": "1234",
    "vowels": "aeu",
    "alpha":[
        "xyz",
        "efgh"
    ],
    "new": [
        {"name":"nam1"},
        {"name":"nam2"}
    ]
}

The filter where={"alpha.[*]":{"$eq":"xyz"}} returns no data.

ErickRamirezDS avatar Sep 14 '21 00:09 ErickRamirezDS

I can confirm this is not-supported currently, the wildcards * and [*] work as expected if they are not at the end of the search path. For example, x.[*].z is valid..

~We will create a fix asap and it will be included in the one of the following releases..~

ivansenic avatar Sep 14 '21 07:09 ivansenic

There could be some serious indexing implications that might not make this actually possible. We count to have a leaf name (or array index) that's being searched for in order to utilize the indexes. If this is missing, we don't have any path related index to use and this can result in failing back to the value index. However, since value index matches different paths, it would not be as effective.

I will check with my team and come back with more information.

ivansenic avatar Sep 14 '21 10:09 ivansenic

Hi there, any news about this issue? Is that something only related to the Document API? Would it work with REST or GraphQL?

Thanks in advance!

mindflayer avatar Nov 11 '21 20:11 mindflayer

We still have not put this on the roadmap, due to the implications it has. If we implement this, these queries will be slower, as we can not utilize the index for the leaf name and would only have value index. But that matches all paths in all documents, thus the path matching would be done in memory. It's gonna be expensive operation, and this is why we did not prioritize this at the moment.

I am not sure what REST and GraphQL offer as the filtering possibilities, but I guess this kind of search is only possible in Document API. That said, note that Documents API uses it's own way to store the JSON in the Cassandra table, and this has nothing to do with how REST and GraphQL work.

@mindflayer If you say that this is a feature that is very important to you and stops you from progressing further I can revisit this and try to see if we can prioritize it asap.

ivansenic avatar Nov 12 '21 12:11 ivansenic

If you say that this is a feature that is very important to you and stops you from progressing further

Many thanks for asking, I really appreciate it. The first use case we encountered has been redesigned using a map instead of an array of values, after reading this issue. With the same approach, of course, we could solve different problems. I would still consider investing time in it if you think it could be solved in some way. What is nice in a Document DB is the fact that you don't need big transformations from the way you store data to the one consumed by your clients. I am not a Cassandra expert, and I imagine it would probably be different, but you may have a look at the way Elasticsearch deals with the same problem. By the way, I read this and it really helped to understand the way you organize the data under the hood: https://stargate.io/2020/10/19/the-stargate-cassandra-documents-api.html

mindflayer avatar Nov 12 '21 13:11 mindflayer

@mindflayer The Elasticsearch is not storing the data as we are. We are having a Cassandra as a data store, thus storing JSON is way more challenging. This implies that queries the data and supporting all the standard filters and aggregations is sometimes not as easy, or simply comes with an extra cost.

ivansenic avatar Nov 12 '21 13:11 ivansenic

Cannot you transform it to a list of maps?

{
    "tags": ["foo", "bar"]
}

Could be internally stored the same way you'd store this:

[{"tags": "foo"}, {"tags": "bar"}]

I guess this way the query would be: where={"[*].tags":{"$eq":"foo"}} instead of where={"tags.[*]":{"$eq":"foo"}}.

mindflayer avatar Nov 12 '21 14:11 mindflayer

@mindflayer Not really.. Have a look here -> https://www.datastax.com/blog/2021/04/stargate-documents-api-storage-mechanisms-search-filters-and-performance-improvements on how the data is stored in the C table.

ivansenic avatar Nov 16 '21 14:11 ivansenic

Checking to see if there is any movement on this issue. I cannot map or transform the JSON data that I'm storing. However, I need the ability to search the leaf nodes of the array.

GeorgeCrossIV avatar Sep 07 '22 03:09 GeorgeCrossIV