azure-functions-host Target Based Scaling not working with Flex Consumption Functions for Service Bus Single dispatch processing

Have a set of functions that are all using Flex Consumption and are using Service Bus Topics (single message) as a trigger

Each function does the following

Receive Message from Service Bus
Update Azure SQL
Either add another message to a service bus topic or not
Complete Message

Under load Azure SQL was running out of DTU's due to the amount of concurrent functions running.

Followed guidance as per https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/azure-functions/functions-target-based-scaling.md#service-bus-queues-and-topics

Host.json for each function configured as follows

{ "version": "2.0", "logging": { "applicationInsights": { "samplingSettings": { "isEnabled": true, "excludedTypes": "Request" }, "enableLiveMetricsFilters": true } }, "extensions": { "serviceBus": { "maxConcurrentCalls": 1 } } }

Application configuration has the following configuration added:

          {
              "name": "WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT",
              "value": "1",
              "slotSetting": false
          }

No difference found in application scaling Azure SQL still running out of DTU's due to scale out of Azure Functions which seems to be ignoring configuration

Repro steps

Add 300 messages to initial SB topic

Expected behavior

Concurrency of service bus concurrency processing constrained due to configuration above

Actual behavior

SQL DTU running at 100% (S4 SKU used with 200DTU limit). Many concurrent functions running at same time observable in invocation logs

Investigative information

Function App version: Flex Consumption
Function App name: plan-clapoc-core-test-lla-storeandforward-uksouth, plan-clapoc-core-test-lla-preprocessor-uksouth, plan-clapoc-core-test-lla-enrich-uksouth, plan-clapoc-core-test-lla-search-uksouth, plan-clapoc-core-test-lla-reduce-uksouth
Function name(s) (as appropriate): func-clapoc-core-test-lla-storeandforward, func-clapoc-core-test-lla-preprocessor, func-clapoc-core-test-lla-enrich, func-clapoc-core-test-lla-search, func-clapoc-core-test-lla-reduce
Region: UK South

-Nuget Service bus extension Microsoft.Azure.Functions.Worker.Extensions.ServiceBus" Version="5.22.0"

Known workarounds

Only workaround is to limit rate of ingress into initial service bus topic

Related information

Provide any related information

Programming language used : C#

Bindings used All functions follow same pattern but use a different Service Bus Subscription/filter

  [Function(nameof(StoreAndForward))]
  public async Task Run(
      [ServiceBusTrigger("sbt-clapoc-storeandforward", "storeandforward", Connection = "ServiceBus_ConnectionString")]
      ServiceBusReceivedMessage message,
      ServiceBusMessageActions messageActions)
  {
      await base.RunFunction(message, messageActions);      
  }

Oct 11 '24 10:10 andynorrisjumar

@andynorrisjumar thank you for reporting this. To control the maximum scale out of Flex Consumption please check this documentation. By default, apps running in a Flex Consumption plan have limit of 100 overall instances. Currently the lowest maximum instance count value is 40. So to set the app to that lowest possible value of maximum scale you should use:

az functionapp create --resource-group <RESOURCE_GROUP> --name <APP_NAME> --storage <STORAGE_ACCOUNT_NAME> --runtime <LANGUAGE_RUNTIME> --runtime-version <RUNTIME_VERSION> --flexconsumption-location <REGION> --maximum-instance-count 40

With this, if you put 300 messages, with the concurrency you shared in your host.json, that would mean the app would scale to a maximum of 40 instances, with each instance handling 1 message at a time, and as each of those messages get processed the instances would pick the next message from the queue, until the 300 messages are done.

Or, if you are using ARM or Bicep, this setting is in maximumInstanceCount inside scaleAndConcurrency of the new functionAppConfig section.

Is this something you can test?

Nov 06 '24 22:11 nzthiago

Hi Thiago,

Thank you for the explanation and I think I can see where the problem lies (your maximum of 40 being the smallest value). We are set up using Bicep and have 40 set already for maximum.

In our use case have a single Service Bus Namespace and 5 topics. Each topic is backed by an Azure Flex Consumption Plan. Each function does the following.

Peek Message from Queue
Look up config in SQL.
Process message and potentially create a new message for the next topic or end processing
Complete the message.

The same SQL DB is shared across the entire system. The symptoms we were seeing is SQL exhausting connections and CPU.

So, using your explanation, you can see withing a few seconds from starting we would have been up to 200 concurrent connections and processing. Your choice of 40 as a minimum number of concurrent instances for flex-consumption does not really suit scenarios where we are trying to throttle concurrency as 40 is still quite a large number. As you can’t have multiple function projects to a flex consumption plan, shared resources across plans are likely to get impacted.

With the information you have given I can arrange a test to confirm, but believe it is the large minimum value of maximum-instance-count is the root cause here.

Regards

Andy Norris Principal Architect T. 0121 788 4550tel:0121%20788%204550 @.*** jumar.co.ukhttps://uk.content.exclaimer.net?url=https%3A%2F%2Fwww.jumar.co.uk%2F%3Futm_source%3Demail-signature%26utm_medium%3Demail%26utm_campaign%3Dweb-link&tenantid=hxOGJIrZEe6JJ2BFvdGTLg&templateid=4dfa95189bc5ee1185f96045bdc1af5d&excomponentid=bubGCExJhR3soNLsN2ET08VVIuHw1l9CjC_hcuBjseQ&excomponenttype=Link&signature=bFEzzzPcWqd05NWbHM5TMH3llnC9_06ScOo0lePToLvJ_aXA8CZF4Oj5SqbteV-l7q4kOn9gUq8HlSEo66IWEby9rMrpMIsAIP5zB5wvYMLYHymJQeciaUBle_wBVqjQSzbMqDR7ALPn4WXafVD3SKc8xFvAYSTNZEnfCGt8IqmkoPiyjQsLm9_AfBu69wEk0XEekS77jxbIu5aDyjEiZAsBY8UVlkRbxyre9_BPHv6Cy2zckL66yrvAspVYgsvNGIsR6hpvBev1EWyvRC0G3awe7ifCzKC0_d_0gX2ZfYYnXyN4BOXC040bvTSWv9ux06aQIrTQ50yvv0yMd1NnsA&v=1 [LinkedIn]https://uk.content.exclaimer.net/?url=https%3A%2F%2Fwww.linkedin.com%2Fcompany%2Fjumar-digital-services%2F&tenantid=hxOGJIrZEe6JJ2BFvdGTLg&templateid=4dfa95189bc5ee1185f96045bdc1af5d&excomponenttype=SocialMediaIcon&signature=zPQgGLIpXOye8Ecww4fQ3BOJj2C2schaUWM9pFwP5gdud0uoY-YB4ApTSsJnYBWjLYoVlVJQ2GT7fjNSYjDFo-64zxFm8m0QXqzxFTOst6HOrvktYhHwvYofnsUnrCOelAuj1kibQElzNRL8NGNTC_nuWPcvItaOSy0HZO71P-0I_Sa2gyg5YReW73awsPvO-vttA_KNaL_uOVfiWhi5jek-OpupnpWYdEpAPQ8WLHqIyGUqWO-JhuOzINAgmmpFvnIlvhFXPX7Kp3Vxjvz82Zlr6Ng62dLO7vgRMpQsh8X4nFZCj8uMUWouaUBtYea8BG_KQn7kERj03J5tPT19Lw&v=1 @.*** Jumar Solutions Limited (company number 02333415) and Jumar Technology Limited (company number 11786401) are each direct subsidiary of Jumar Holdings Limited (company number 10917342). Each of these companies are limited companies registered in England and Wales with their registered offices at Jumar House, Pinewood Business Park, Coleshill Road, Solihull B37 7HG, United Kingdom. This message and any attachments are intended solely for the addressee and may contain confidential information. If you have received this message in error, please send it back to us, and immediately and permanently delete it. Do not use, copy or disclose the information contained in this message or in any attachment. Although we have taken all reasonable precautions, email may be susceptible to data corruption, interception and unauthorised amendment - we do not accept liability for any such corruption, interception or amendment or the consequences thereof. For information about how we process data and monitor communications please see our Data Protection and Privacy Policy at https://www.jumar.co.uk/privacy-policy/ https://uk.content.exclaimer.net?url=https%3A%2F%2Fwww.jumar.co.uk%2Fprivacy-policy%2F&tenantid=hxOGJIrZEe6JJ2BFvdGTLg&templateid=4dfa95189bc5ee1185f96045bdc1af5d&excomponentid=-7zlsYb7vHUSMP-MkMbF2rYvTY43vUCzVeDoJP14iUs&excomponenttype=Link&signature=ps-7a3oPRpRIugL2A9G-QW1O3HFodP8n-ROX3ySNQl3Zkz0ixOIm6vsh7ies4JCtOQiga0YHIwczjctKPbpjAlXM5-UyR5KgRJgvYJ6ZlGDrwsYB03iCuyUMdcH4gtGonKXhMViK_3FEZknjgYOsPCCVMiu5ua42qQxqkLtEZXzvDEntm-MogUmi_j96jlp5bmoLABbAt6-_2EOX4IssMbnvM1WB9_uEqgvn1Zn01Vmp4yToclqHbs8xLCZBWmTmhpKNI0HVdFNaXtsfOxpBJPJXiSlBYb5ihPANIeDHgaCcDaEX6kYHPU4bhm-u9Q7dPJ2t4zfVXCwqV9ZdRM7d9w&v=10501202402333415 @.*** Think before you print. Reduce your impact on the environment by choosing not to print this email.

From: Thiago Almeida @.> Sent: 06 November 2024 22:36 To: Azure/azure-functions-host @.> Cc: Andy Norris @.>; Mention @.> Subject: Re: [Azure/azure-functions-host] Target Based Scaling not working with Flex Consumption Functions for Service Bus Single dispatch processing (Issue #10523)

WARNING: This email originated from outside of Concept & Jumar. Do not click any links, open any attachments or action any request unless you trust the sender.

@andynorrisjumarhttps://github.com/andynorrisjumar thank you for reporting this. To control the maximum scale out of Flex Consumption please check this documentationhttps://learn.microsoft.com/en-us/azure/azure-functions/event-driven-scaling?tabs=azure-cli#flex-consumption-plan. By default, apps running in a Flex Consumption plan have limit of 100 overall instances. Currently the lowest maximum instance count value is 40. So to set the app to that lowest possible value of maximum scale you should use:

az functionapp create --resource-group <RESOURCE_GROUP> --name <APP_NAME> --storage <STORAGE_ACCOUNT_NAME> --runtime <LANGUAGE_RUNTIME> --runtime-version <RUNTIME_VERSION> --flexconsumption-location --maximum-instance-count 40

With this, if you put 300 messages, with the concurrency you shared in your host.json, that would mean the app would scale to a maximum of 40 instances, with each instance handling 1 message at a time, and as each of those messages get processed the instances would pick the next message from the queue, until the 300 messages are done.

Or, if you are using ARM or Bicep, this setting is in maximumInstanceCounthttps://github.com/Azure-Samples/azure-functions-flex-consumption-samples/blob/351569ac1b20fb164f4ac06a620eba875873c95f/IaC/bicep/core/host/function.bicep#L68 inside scaleAndConcurrency of the new functionAppConfig section.

Is this something you can test?

— Reply to this email directly, view it on GitHubhttps://github.com/Azure/azure-functions-host/issues/10523#issuecomment-2460931356, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BDR523L3KJSGZ2I6U7IMCPLZ7KKUPAVCNFSM6AAAAABPYVZHB6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRQHEZTCMZVGY. You are receiving this because you were mentioned.Message ID: @.@.>>

Nov 07 '24 15:11 andynorrisjumar

I understand. It is indeed on our backlog to allow this setting to go lower, but unfortunately this 40 as the lowest possible value for max instance count will remain for now. One thing to consider is having only one Flex Consumption app, limited to 40, but with five functions in that same app, each function triggering from a different topic. This would mean instead of possible 200 concurrent calls to SQL you would be limited to 40. Worth testing.

Nov 08 '24 23:11 nzthiago

@nzthiago : Do you have an update on this behaviour?

We're experiencing a similar problem where we have a few 100k+ messages on the service bus. The function app picks up the messages, communicates with an Azure SQL Database (currently a DTU Standard S2) and sends the message to another Service Bus queue. Upon starting, the function app successfully processes +/- 90k messages, but also puts 10k on the DLQ because the called services (SQL & ServiceBus) are flooded.

Having a performant Flex Consumption function is good, but of course we need to be able to integrate it with other services as well. So what are the current options for throttling it as much as possible?

Feb 18 '25 10:02 YvesVanStappen

This feature to allow for a lower than 40 is still in the backlog. One consideration we have in mind is that allowing for a low maximum instance count could affect more often the Flex Consumption per function scaling feature. Would it be better to:

Allow the Maximum Instance Count setting to continue as it is and apply to the entire function app. Or
Allow the Maximum Instance Count setting to be configurable per function / function group?

The former would be faster to implement and roll out, but it could impact some of your functions in the function app, so you would need to ensure that maximum instance count caters for all the functions/function groups in your app. For example, if you set the maximum instance count on the app to 1, and an HTTP triggered function causes the app to scale to 1 to handle its executions, the app would not be able to scale more to add instances for other trigger functions in the app and those functions would never execute. Is this acceptable?

In the meantime, some possible options:

If only a maximum of one is required, consider a timer trigger based solution. Set up a timer triggered function. The function will read the messages and process a batch of them in whatever order you need.
Use Durable Functions and a manually created semaphore as a durable entity, like this example
Implement a circuit breaker. For .NET for example this is possible with Polly with a break duration

Feb 19 '25 01:02 nzthiago

Hi @nzthiago ,

It's indeed a trade-off that needs to happen, but just my two cents here:

In an ideal situation we could configure the Maximum Instance Count per function / function group. That allows for granular scaling and can cope with almost every scenario.
If it's harder to implement, then lowering the Maximum Instance Count on function app level is a good work around. If users want to specify a Maximum Instance Count of 1 with multiple functions on the function app, it might be their own responsibility to increase it?

Anyway, we were able to fix it, while awaiting the final fix, by setting the AzureFunctionsJobHost__extensions__serviceBus__maxConcurrentCalls value to 1 in the environment variables. As we saw it still might scale to 40, but at least the dependencies were able to handle this load.

PS: we saw there was an issue as well on the timer triggered functions in #10527 as well?

Feb 20 '25 12:02 YvesVanStappen

Thank you @YvesVanStappen for the input - certainly helps to know what would be ideal for us to implement here for maximum instance count. Thanks for pinging on the Timer trigger one, we're releasing a fix and I added a comment there.

Feb 21 '25 18:02 nzthiago

@nzthiago do you know the status of the max instance count for flex consumption plan at this point? And do you know if there any plans/ways to customize the scale behaviour? For we need the scale to respond to number of ServiceBus Sessions in a queue not the total number of messages, otherwise our app scales way too quickly. Limiting the number of instances to 40 is better but still not ideal as the app in our case scales too aggressively.

May 15 '25 02:05 gorillapower

@nzthiago - just upvoting this issue - be great to be able to limit scale to 1.

Our use case involves processing messages off a service bus and we can't guarantee order if 40 of those messages can be run in parallel. I suspect there are code changes we can make, but this is a legacy service and we can do that with provisioned function app plans (which is what we will use for now).

Thanks for your help with this.

May 20 '25 10:05 petecliff

@nzthiago do you know the status of the max instance count for flex consumption plan at this point? And do you know if there any plans/ways to customize the scale behaviour? For we need the scale to respond to number of ServiceBus Sessions in a queue not the total number of messages, otherwise our app scales way too quickly. Limiting the number of instances to 40 is better but still not ideal as the app in our case scales too aggressively.

While the platform might scale the app more than the number of sessions, there shouldn't be more instances than the number of sessions actively processing messages. Each instance of the function app, as a session receiver, will be processing from a group of specific session at a time (which you can control in host.json). And the app's number of instances actively processing from the sessions shouldn't scale past the number of sessions. So if it overscales you shouldn't be concerned as those instances won't be charged for.

I just ran two tests. Test 1 - test where I placed 1000 messages in service bus, 500 with sessionid=1 and 500 with sessionid=2, then enabled my node function app. And sure enough in App Insights live metrics I can see the app scaled out to 100 (the max scale out I had on the app) instead of to just 2. But in the telemetry I could see only two instances were actively processing from the queue:

Test 2 - I set my app's max scale out to 40. Stopped the app. Used my loader app to add 15 sessions and 500 messages per session to service bus queue (7500 messages total). Then I started my app and again can observe only 15 worker instances at most were actively processing from service bus:

So you could still set max scale out to 40 (the lowest value we currently allow) on the app just to be safe, and then use sessions, given you wouldn't be paying for instances that aren't actively executing functions on Flex. It's just that on the platform side it's overscaling as there'd be idle workers.

In saying that - we are actively working on lowering that '40' number. It will still take a few more months for it to finalize and roll out unfortunately.

May 23 '25 22:05 nzthiago

@nzthiago Fantastic! Thank you for thorough and informative answer. I was under the impression that GB-second cost calculation would be for idle instances too, happy to learn that its only for actively processing executions, thank you!

May 23 '25 23:05 gorillapower

@nzthiago - Sounds like that could work for us - thank you!

Jun 03 '25 15:06 petecliff

(maybe i'm hijacking an issue, feel free to split)

I'm running into almost the same issue, but with the simpler queue from a storage account.

I'm dumping a few hundred items in a queue in storage account to be processed by a function. In the host.json I've set "batchSize": 1, (in the "extensions" - "queues" section) in order to limit the amount of items being processed concurrently. Maybe the batchSize per instance in respected, but the flex consumption instantly scales out to many instances, thus effectively ddos-ing the api called by the processing function.

What is the suggested way to process the items from the queue one-by-one using a flex-consumption plan?

Nov 06 '25 08:11 Bodewes

Setting batchSize to 1 will just make the app scale out to as many instances as needed to process all the messages in the queue, up to the app's maximum instance count setting - kind of the opposite of what you want. Because if you put 100 messages in, the app will try to scale to 100 so each instance processes one message.

The maximum instance count can be set in the portal here for example, but the lowest it can be set to currently is 40 (discussed above) until we implement the feature to reduce that to 1:

If 40 messages processed at the same time doesn't work for you, as far as options go, it's the same ones I call out in this post above. The simplest being using a timer triggered function instead of a queue trigger.

We still have this high in our backlog (allowing the lowest value for maximum instance size to be 1), hoping to tackle it the first half of next year, but we are working through some larger platform features first though.

Nov 07 '25 00:11 nzthiago

Thanks for the detailed reply! I went ahead a implemented a timer function to process the queue in a more controlled way. That works. All these intricates of the 'new' flex consumption aren't always as clear as could be in the documentation.

Nov 07 '25 08:11 Bodewes

We are looking to migrate from consumption to consumption flex. But sadly a hard requirement is that we can set it to 1. The system we integrate with is heavily throttled and we have around 45 separate azure function apps (integrations) towards this system and is event based where a timer trigger sets a bunch of events in motion that the integrations then handle with the sb trigger. And after almost 1 year of tweaking and optimizing we must set it to 1.. :( So what is the current timeline @nzthiago? 😀

Jan 02 '26 22:01 Bandgren

Currently in West Europe, Australia East, and West US 2 we have enabled so that, with the AZ CLI or ARM/Bicep/Terraform (AzureRM latest version), you can reduce a Flex Consumption function app maximum instance count to less than 40, in a preview/testing capacity.

Example to set to 1:

az functionapp scale config set --name <function app name> --resource-group <resource group name> --maximum-instance-count 1

If possible, please test in one of those regions and let me know. We plan to roll it out to all regions over time, hopefully in the next couple of months and also update tooling and the Azure Portal if there are no major issues or concerns.

Note: Setting maxInstanceCount to 1 (or anything below 40) creates situations where the app is unable to scale out as load increases, obviously. That can lead to increased request failures and scale throttling, especially for HTTP functions. So, while technically possible, it should be used only for very specific, well-understood workloads. Also note that this applies to the combined number of instances across all the functions and funciton groups in the same Flex Consumption app. So if setting it to, say, 1 instance, I'd make sure the function app only has one function in it or just functions that belong to the same function group.

Jan 05 '26 22:01 nzthiago