components-contrib icon indicating copy to clipboard operation
components-contrib copied to clipboard

Cosmos DB: Query API not working if there's more than 1 partition

Open ItalyPaleAle opened this issue 2 years ago • 11 comments
trafficstars

This issue was reported by a user via Microsoft Support. The findings are reported below

It seems the issue occur when Cosmos DB has multi partitions.

(1) Create local Dapr environment and configure state store to cosmos db.

apiVersion: dapr.io/v1alpha1
kind: Component
metadata:
  name: statestore
spec:
  type: state.azure.cosmosdb
  version: v1
  metadata:
  - name: url
    value: https://something.documents.azure.com:443/
  - name: masterKey
    value: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  - name: database
    value: db
  - name: collection
    value: collection

(2) Configure Cosmos DB scale to use multiple partitions. Setting the throughput to Manual: 11000. image

(3) After setting (2), wait about 30 minutes, as it takes time for the partitions to actually scale out.

(4) Then run dapr.

(5) Run the following API.

https://docs.dapr.io/reference/api/state_api/#query-state

You can see the following error.

PS C:\Users\toruita> Invoke-WebRequest -Method Post -Headers @{"Content-type"="application/json"} -Uri 'http://localhost:3500/v1.0-alpha1/state/statestore/query?metadata.contentType=application/json' -Body '{}'
Invoke-WebRequest : {"errorCode":"ERR_STATE_QUERY","message":"failed query in state store statestore: context canceled"}
???? ?:1 ??:1
+ Invoke-WebRequest -Method Post -Headers @{"Content-type"="application ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: ([System.Net](http://system.net/).HttpWebRequest:HttpWebRequest) [Invoke-WebRequest]?WebException
    + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand----```

ItalyPaleAle avatar Aug 02 '23 14:08 ItalyPaleAle

What is interesting is that we specifically consulted with cosmos DB folks on this - since the SDK does not have built in support for cross partition queries we manually added the required headers (via the policy options) to enable cross partition queries.

Worth investigating further why that isn't working.

berndverst avatar Aug 02 '23 15:08 berndverst

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged (pinned, good first issue, help wanted or triaged/resolved) or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Sep 01 '23 15:09 github-actions[bot]

+1 for fixing this

joshuadmatthews avatar Oct 21 '23 19:10 joshuadmatthews

I reached out to the Azure CosmosDB service team to get their thoughts on our implementation and why this could be happening.

berndverst avatar Nov 13 '23 20:11 berndverst

There is a possibility that this cannot be fixed and we actually would need to remove Query API support for CosmosDB. This is a possibility given the Alpha status.

We may need to reimplement query support entirely to do the following:

  • Get each partition: https://learn.microsoft.com/en-us/rest/api/cosmos-db/get-partition-key-ranges
  • Then query each partition independently and combine results in memory. The results would not be sorted across the partitions.

This is a lot of work, especially for an API that we do not plan to bring to Beta or Stable.

berndverst avatar Nov 13 '23 20:11 berndverst

Another option - and this is probably the easiest:

  1. Check whether there is only a single partition
  2. If 1 partition: perform query with the native SDK method - not our custom code.
  3. Otherwise, throw an error -- we will not support cross-partition queries.

This approach might be acceptable because nobody should be using an Alpha component in a production scenario with multiple partitions.

berndverst avatar Nov 13 '23 20:11 berndverst

I’d vote for an actual fix. Cross partition queries are certainly supported in Cosmos, it would be nice for them to work with Dapr.

joshuadmatthews avatar Nov 13 '23 20:11 joshuadmatthews

I’d vote for an actual fix. Cross partition queries are certainly supported in Cosmos, it would be nice for them to work with Dapr.

@joshuadmatthews cross-partition queries are not supported in the CosmosDB GO SDK and the Azure SDK team has no plans to implement this in their roadmap. Cosmos DB does not perform these queries server side but does so manually in the SDK with lots of manual code to aggregate and sort things in memory. You can read all about it if you go to the Azure SDK repos. That is too much work for Dapr however.

So the choices are single partition only, removing the Query support entirely, or possibly a rudimentary support where we send the same query to each partition but will not perform any further aggregation, sorting or filtering in Dapr.

Technically our current implementation should work, but the gateway server (not used by any of the official SDKs because of its severe query limitations) seems to time out. We have no choice but to change our approach.

If the Azure SDK for Go Team ever provides native cross partition query support we'd of course use that instead.

I want to remind the community again that Alpha in Dapr means experimental - we may discontinue Alpha features. We have long decided that Query API cannot progress to Beta given the way it was designed. It is not sustainable to support and maintain this. I must strongly discourage using the Query API.

berndverst avatar Nov 13 '23 21:11 berndverst

Can you share a link to the Azure SDK repo section you are talking about? Interested to read up on that. The dotnet v3 Cosmos SDK seems like an official SDK, and also seems to be using the header approach, but I'm sure I'm missing something there.

https://github.com/Azure/azure-cosmos-dotnet-v3/blob/e534de251bdadafd1adb960da83c15d463486a66/Microsoft.Azure.Cosmos/src/RequestOptions/QueryRequestOptions.cs#L173-L177

joshuadmatthews avatar Nov 13 '23 23:11 joshuadmatthews

I explained in my previous comment how the DotNet SDK manages to query all partitions and aggregate results. The Go SDK does not have the ability to do this.

As a result, this would need to be manually implemented in Dapr. That feels like the wrong approach however. Instead we need to wait for the Go SDK (github.com/azure/azure-sdk-for-go) to support this for CosmosDB.

If anyone feels inclined to work on this, I suggest contributing to the Azure SDK for Go, and then Dapr can simply consume the updated SDK.

berndverst avatar Jan 25 '24 20:01 berndverst

If we are going to remove the query api from cosmosdb, then we need a way to filter items.

litan1106 avatar Mar 28 '24 22:03 litan1106