azure-cosmos-dotnet-repository icon indicating copy to clipboard operation
azure-cosmos-dotnet-repository copied to clipboard

Share throughput over containers

Open mabvanaartrijk opened this issue 2 years ago • 11 comments

Is your feature request related to a problem? Please describe. If ContainerPerItemType = true, the containers are created with their own throughput (autoscale or manual). However, I would like to share the total database throughput over the containers. This is a setting in Cosmos, but I can't get it to work.

Describe the solution you'd like Add a setting, or make it possible to use manually created containers (see below).

Describe alternatives you've considered I've tried to manually create the containers in the Data Explorer, and use those as Cosmos repositories, but this doesn't work.

Additional context Thanks for this great library🙌

mabvanaartrijk avatar Oct 29 '21 15:10 mabvanaartrijk

Hey, @mabvanaartrijk I like this idea, so what you are saying is rather than setting the throughput at each level of each container, you would like the throughput at the database level that all the containers share? (I do just need to check on how this works in terms of database level throughput not something I have done before)

Around the issue with not allowing manually created container's could you provide more information on this, please?

mumby0168 avatar Oct 30 '21 14:10 mumby0168

Hi @mumby0168 . Yes, if you create a new container in the Data Explorer on Azure Portal you have the option to provision dedicated throughput for the container. So a RepositoryOptions__ShareDatabaseThroughput option would be really helpful.

Steps to reproduce the issue:

  1. Create a container in Data Explorer with the option Provision dedicated throughput... off.
  2. Add a CosmosRepository, for example:

builder.Services.AddCosmosRepository(options => { options.CosmosConnectionString = "CosmosConnectionString"; options.ContainerPerItemType = true; options.DatabaseId = "database"; options.SyncAllContainerProperties = true; options.AllowBulkExecution = true; options.ContainerBuilder.Configure<TestItem>(containerOptions => containerOptions .WithContainer("Test") .WithPartitionKey("/testPk") );})

  1. Create an item using await _repository.CreateAsync(testItem)
  2. This fails with the message: Response status code does not indicate success: NotFound (404); Substatus: 0; ActivityId: ; Reason: (Throughput is not configured for Test);

Thanks!

mabvanaartrijk avatar Nov 01 '21 09:11 mabvanaartrijk

Hi @mumby0168 . Yes, if you create a new container in the Data Explorer on Azure Portal you have the option to provision dedicated throughput for the container. So a RepositoryOptions__ShareDatabaseThroughput option would be really helpful.

Steps to reproduce the issue:

  1. Create a container in Data Explorer with the option Provision dedicated throughput... off.
  2. Add a CosmosRepository, for example:

builder.Services.AddCosmosRepository(options => { options.CosmosConnectionString = "CosmosConnectionString"; options.ContainerPerItemType = true; options.DatabaseId = "database"; options.SyncAllContainerProperties = true; options.AllowBulkExecution = true; options.ContainerBuilder.Configure<TestItem>(containerOptions => containerOptions .WithContainer("Test") .WithPartitionKey("/testPk") );})

  1. Create an item using await _repository.CreateAsync(testItem)
  2. This fails with the message: Response status code does not indicate success: NotFound (404); Substatus: 0; ActivityId: ; Reason: (Throughput is not configured for Test);

Thanks!

Hey @mabvanaartrijk thanks for the detailed response, ah okay I see what is happening here.

So the option you have set options.SyncAllContainerProperties = true; will be trying to sync all containers with the default throughput value of 400 RU/s and a manual throughput as the properties.

Is the desired behaviour you would expect to be that if you have configured the setting RepositoryOptions__ShareDatabaseThroughput that then the database would be synced effectively and not the containers?

mumby0168 avatar Nov 01 '21 09:11 mumby0168

Thanks @mumby0168! I've misinterpreted the SyncAllContainerProperties setting. It works perfectly with this setting to false.

So to recap: manually creating the collections in Data Explorer and setting SyncAllContainerProperties to false works. It would be a nice feature if you can set RepositoryOptions__ShareDatabaseThroughput and use a shared database throughput setting for all underling containers, that are created automatically if they don't exist.

mabvanaartrijk avatar Nov 01 '21 14:11 mabvanaartrijk

Thanks @mumby0168! I've misinterpreted the SyncAllContainerProperties setting. It works perfectly with this setting to false.

So to recap: manually creating the collections in Data Explorer and setting SyncAllContainerProperties to false works. It would be a nice feature if you can set RepositoryOptions__ShareDatabaseThroughput and use a shared database throughput setting for all underling containers, that are created automatically if they don't exist.

Great news @mabvanaartrijk yeah it certainly could be would this be something you'd like to look to implement? We are happy to accept any PR's

mumby0168 avatar Nov 01 '21 14:11 mumby0168

I'll discuss with my team if we can add it to one of our next sprints👍

mabvanaartrijk avatar Nov 01 '21 15:11 mabvanaartrijk

I'll discuss with my team if we can add it to one of our next sprints👍

Great look forward to hearing from you 👍

mumby0168 avatar Nov 02 '21 21:11 mumby0168

Hi @mumby0168 @mabvanaartrijk @IEvangelist

I've been looking at exactly this need today.

Currently all our containers are getting their own throughput which is looking like an expensive hobby $!

Database level throughput would be great for test environments in particular but likely also prod until we know what the userbase will be like as cost is a consideration for my client.

I think possibly more than a bool might be a nice option.

Initially I was looking to populate using ARM templates but if we could get it into the repo that would be a much better option for us.

ARM examples are as per:

@description('Throughput value when using Manual Throughput Policy for the container')
@minValue(400)
@maxValue(1000000)
param manualProvisionedThroughput int = 400

@description('Maximum throughput when using Autoscale Throughput Policy for the container')
@minValue(1000)
@maxValue(1000000)
param autoscaleMaxThroughput int = 4000

@description('The throughput policy for the container')
@allowed([
  'Manual'
  'Autoscale'
])
param throughputPolicy string = 'Manual'

var throughput_Policy = {
  Manual: {
    throughput: manualProvisionedThroughput
  }
  Autoscale: {
    autoscaleSettings: {
      maxThroughput: autoscaleMaxThroughput
    }
  }
}

CosmosClient.CreateDatabaseIfNotExistsAsync has an overload to accept ThroughputProperties.

namespace Microsoft.Azure.Cosmos
{
    //
    // Summary:
    //     Represents a throughput of the resources in the Azure Cosmos DB service. It is
    //     the standard pricing for the resource in the Azure Cosmos DB service.
    //
    // Remarks:
    //     It contains provisioned container throughput in measurement of request units
    //     per second in the Azure Cosmos service.
    public class ThroughputProperties
    {
        //
        // Summary:
        //     Gets the entity tag associated with the resource from the Azure Cosmos DB service.
        //
        // Value:
        //     The entity tag associated with the resource.
        //
        // Remarks:
        //     ETags are used for concurrency checking when updating resources.
        [JsonProperty(PropertyName = "_etag", NullValueHandling = NullValueHandling.Ignore)]
        public string ETag { get; }
        //
        // Summary:
        //     Gets the last modified time stamp associated with Microsoft.Azure.Cosmos.DatabaseProperties
        //     from the Azure Cosmos DB service.
        //
        // Value:
        //     The last modified time stamp associated with the resource.
        [JsonConverter(typeof(UnixDateTimeConverter))]
        [JsonProperty(PropertyName = "_ts", NullValueHandling = NullValueHandling.Ignore)]
        public DateTime? LastModified { get; }
        //
        // Summary:
        //     Gets the provisioned throughput for a resource in measurement of request units
        //     per second in the Azure Cosmos service.
        [JsonIgnore]
        public int? Throughput { get; }
        //
        // Summary:
        //     The maximum throughput the autoscale will scale to.
        [JsonIgnore]
        public int? AutoscaleMaxThroughput { get; }
        //
        // Summary:
        //     Gets the self-link associated with the resource from the Azure Cosmos DB service.
        //
        // Value:
        //     The self-link associated with the resource.
        //
        // Remarks:
        //     A self-link is a static addressable Uri for each resource within a database account
        //     and follows the Azure Cosmos DB resource model. E.g. a self-link for a document
        //     could be dbs/db_resourceid/colls/coll_resourceid/documents/doc_resourceid
        [JsonProperty(PropertyName = "_self", NullValueHandling = NullValueHandling.Ignore)]
        public string SelfLink { get; }

        //
        // Summary:
        //     The Throughput properties for autoscale provisioned throughput offering
        //
        // Parameters:
        //   autoscaleMaxThroughput:
        //     The maximum throughput the resource can scale to.
        //
        // Returns:
        //     Returns a ThroughputProperties for autoscale provisioned throughput
        public static ThroughputProperties CreateAutoscaleThroughput(int autoscaleMaxThroughput);
        //
        // Summary:
        //     The Throughput properties for manual provisioned throughput offering
        //
        // Parameters:
        //   throughput:
        //     The current provisioned throughput for the resource.
        //
        // Returns:
        //     Returns a ThroughputProperties for manual throughput
        public static ThroughputProperties CreateManualThroughput(int throughput);
    }
}

Looking at: Microsoft.Azure.CosmosRepository.Services.DefaultCosmosContainerService

I wondered if that might be the spot...

public async Task<Container> GetContainerAsync<TItem>(bool forceContainerSync = false) where TItem : IItem
        {
            try
            {
                ItemOptions itemOptions = _cosmosItemConfigurationProvider.GetOptions<TItem>();

                Database database =
                    await _cosmosClientProvider.UseClientAsync(
                        client => client.CreateDatabaseIfNotExistsAsync(_options.DatabaseId, SOME THROUGHPUT PROPERTY OVERRIDE OBJECT GOES HERE)).ConfigureAwait(false);

What do we think? I'm struggling for time myself right now as I'll be away for a bit soon but wanted to chip into the discussion with some thoughts.

totalitydevelopment avatar Nov 16 '21 09:11 totalitydevelopment

Hi @mabvanaartrijk - sorry for not replying earlier. Yes, that looks like the correct place.

IEvangelist avatar Apr 18 '22 12:04 IEvangelist

I would say that in the absence of config to the contrary you should default to shared throughput.

hades200082 avatar Jul 01 '22 19:07 hades200082

Any word on the adding the feature for RepositoryOptions__ShareDatabaseThroughput?

kalyankdas avatar Sep 09 '22 16:09 kalyankdas