documentdb-lumenize icon indicating copy to clipboard operation
documentdb-lumenize copied to clipboard

Performance on Large Dataset

Open buzzcola opened this issue 9 years ago • 0 comments

Continued from this conversation: http://stackoverflow.com/questions/39669376/documentdb-stored-procedure-continuation

I implemented your suggestion to pass the Response object in as the new configuration for the stored procedure when there's a continuation. My code looks like this now:

            var roundtrips = 0;

            var timer = Stopwatch.StartNew();

            var configString = @"{
                    cubeConfig: {
                        groupBy: 'year',
                        field: 'Amount',
                        f: 'sum'
                    },
                    filterQuery: 'select * from TestLargeData t where t.Amount > 0'
                }";

            var config = JsonConvert.DeserializeObject<object>(configString);
            Console.WriteLine($"Query #{roundtrips+1}...");
            var result = await _client.ExecuteStoredProcedureAsync<dynamic>("dbs/foo/colls/bar/sprocs/baz", config);
            roundtrips++;

            while (result.Response["continuation"] != null)
            {
                // make a new config which is the entire response from the last call.
                var nextConfig = JsonConvert.DeserializeObject(result.Response.ToString());
                Console.WriteLine($"Query #{roundtrips + 1}...");
                result = await _client.ExecuteStoredProcedureAsync<dynamic>("dbs/foo/colls/bar/sprocs/baz", nextConfig);
                roundtrips++;
            }

            timer.Stop();

As of this writing my query is on round trip #123 and is taking about 11 seconds per trip.

As mentioned in the SO post, my collection has 1M records and a very simple structure:

{
    "year": 2007,
    "SomeOtherField1": "SomeOtherValue1",
    "SomeOtherField2": "SomeOtherValue2",
    "Amount": 12000,
    "id": "0ee80b66-7fa7-40c1-9124-292c01059562",
    "_rid": "...",
    "_self": "...",
    "_etag": "\"...\"",
    "_attachments": "attachments/",
    "_ts": ...
  }

The collection is set up for 1000 RU's. The indexing policy on the collection is as follows:

{
  "indexingMode": "consistent",
  "automatic": true,
  "includedPaths": [
    {
      "path": "/*",
      "indexes": [
        {
          "kind": "Range",
          "dataType": "Number",
          "precision": -1
        },
        {
          "kind": "Hash",
          "dataType": "String",
          "precision": 3
        }
      ]
    }
  ],
  "excludedPaths": []
}

Is there anything obviously wrong with what I'm doing here?

Thanks very much for your help, I appreciate it!

buzzcola avatar Sep 26 '16 16:09 buzzcola