Question about Flex Consumption Function App Service Bus Trigger
Is your question related to a specific version? If so, please specify:
What language does your question apply to? (e.g. C#, JavaScript, Java, All)
C#
Question
Hi there, I asked this question over here, but this might be more appropriate place to ask, if not my apologies. The question is about service bus messages stay in queue for longer on flex consumption compared to consumption. The explanation is:
I have multiple functions which subscribe to service bus topics using the ServiceBusTrigger. All function apps are running .NET 8 isolated model. The service bus is using standard tier. Each function app is being pinged from application insights every 10 minutes.
I have recently migrated from windows consumption plan to flex consumption plan.
On the consumption plan, when the app scaled down to 0, service bus requests would also drop whereas on flex consumption, it doesn't drop when function apps scale down to 0. It only drops when I turn them off.
I understand that service bus functions are now scaled independently of other trigger types on flex consumption plan. What I am noticing now on flex is that there is some delay even before a message is picked up from the subscription by the trigger in the function, I have seen almost up to 2 minutes. I have never observed delays of this kind on consumption plan.
Is this expected? Is there any configuration or setting I can change in the function app, so it checks for messages more frequently? Even when I have always on instance enabled, the delay still seems to be there although it does seem a bit reduced but then again I haven't done too much with this.
Appreciate any insight into how this works internally.
Thank you!
My host.json file
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
},
"enableLiveMetricsFilters": true
}
}
}
and appsettings.json
{
"Logging": {
"LogLevel": {
"Default": "Information",
"Function.GetHealthFunction": "Error",
"Azure.Messaging.ServiceBus": "Warning",
"Azure.Core": "Warning"
}
}
}
These files have not been changed when migrating to flex consumption.
For context, I have randomly picked a event from app insights before the changeover to flex.
I have created a little sample to help reproduce this issue. https://github.com/dsdilpreet/flex-consumption-service-bus-sample
The sample contains a bicep script to deploy relevant infra and two function apps, one running consumption and the other running flex consumption. As you can see from the results below flex consumption is significantly slower. It took 8 and 11 seconds for flex and just milliseconds for consumption to trigger after message was enqueued to the same topic. I made sure in both cases, exactly one instance was running for each function app, so there was no cold start involved.
Test 1
Consumption
Flex Consumption
Test 2
Consumption
Flex Consumption
Hi @nzthiago! Is this your area of expertise? Would really appreciate your input, this is holding up our flex consumption deployment, unfortunately.
Hi @dsdilpreet - thank you for pinging me, and for sharing a repro. I also added a Linux Consumption app to the test, and for the initial message I can see the same results as you (with Linux Consumption being similar to Flex Consumption). There is likely a "scale from zero" optimization that was done to Windows Consumption that we need to bring to Flex Consumption here.
Can you share what you experience with subsequent messages? I.e., if you wait, say, 30 minutes, and send another message, and then a few more quickly, does the behavior and latency difference change for you? I believe Flex Consumption should be faster for those.
Hi @nzthiago - thanks for getting back.
I don't think its just the cold start at play here. I have done the test you said again on my end (same setup as repo still). I sent the first message (should be cold start because I haven't sent anything to the topic for days) and then I sent 3 more messages within seconds of sending the first one.
| Message | Windows Consumption | Flex Consumption | Notes |
|---|---|---|---|
| 1 (cold start) | ~12s | ~12s | roughly the same |
| 2 (warm) | pretty much instant | ~9s | |
| 3 (warm) | pretty much instant | ~10s | |
| 4 (warm) | pretty much instant | ~10s |
It seems like flex consumption doesn't poll the service bus as frequently compared to windows consumption. Do you have any insights about this?
Thanks for your help so far.
@dsdilpreet thank you for the extra tests, appreciate it. We now understand why you have these results. It is both related to how fast Flex Consumption scales in and how quickly it checks for new messages in the queue. Anything beyond 30 seconds between tests could have the Flex Consumption app scaled back to zero, which was the case for your tests. Once that queue or topic gets busy then Flex Consumption will scale and perform faster than Consumption.
We will discuss internally how to improve this, either with faster checks for changes or take longer to scale in, or both. In the meantime, if you need very fast response for that very initial message, this can be mitigated by enabling one Always Ready instance for that function.
You would be able to see if the instance gets reused or if it's a new instance by looking at the cloud_RoleInstance field in an App Insights query against the traces table. Here's a sample query:
traces
| parse-where message with "Trigger Details: MessageId: " MessageId ", SequenceNumber: " SequenceNumber ", DeliveryCount: " DeliveryCount ", EnqueuedTimeUtc: " EnqueuedTimeUtc ", " *
| extend LatencyToTriggerMs = datetime_diff("millisecond", timestamp, todatetime(EnqueuedTimeUtc))
| project timestamp, EnqueuedTimeUtc, LatencyToTriggerMs, cloud_RoleName, cloud_RoleInstance, MessageId, SequenceNumber, DeliveryCount
| order by EnqueuedTimeUtc asc
With that one Always Ready instance the Flex Consumption app triggers in milliseconds the first and subsequent times:
@nzthiago, you are right. When I send a message very quickly after an instance has started, the flex consumption plan does process it pretty much instantly. But the function seems to scale in irrespective of traffic after about 30 seconds i.e. even if I keep sending messages it will still scale in and an odd message will start a new instance.
We also tried an always on instance and it seems to remedy the problem but we do have a lot of subscriptions, so having an always on instance will have a significant cost implication for our solution.
It will be great if you could configure the polling / scaling as you said in your previous comment. Is there anyway we can track the progress of this as we would like to use flex consumption going forward when this is fixed?
Thank you for replicating this on your end and all your help so far!
@dsdilpreet - we now have in our backlog to introduce a "last instance per function group / individual function remains for 10 minutes" feature, to mitigate the behavior you identified of the app scaling in too fast. This will likely take a few months to be implemented and roll out, so the workaround shared above is recommended for now, even though it might not be the best for your implementation. I will update our documentation once it does roll out, thank you for highlighting this! @pragnagopa @alrod FYI.
Thank you! @nzthiago
Rough timeline helps as well.
Would this also address the service bus polling frequency issue, you know how sometimes a message can stay in the bus for a while before an instance begins to init?
Similar for service bus, that initial message could take up to ~12s, but this can be mitigated by always ready.
Is there any chance this could be improved to be event based in the future? We have a lot of functions that are infrequently run, but we want them to wake up immediately when a message is sent. Paying for an always ready instance removes the scale to zero cost benefits of a function.
Hi team - we are working to make it so the last worker remains for a lot longer (10 minutes) than what it currently does (30 seconds), hopefully this mitigates some of this and reflects what Consumption currently does. Otherwise those ~12 seconds will happen unless always ready is configured for the specific function, or perhaps changing to a different messaging like event grid which pushes to Azure Functions would be the way to go.
@nzthiago the 10 minutes/30 seconds "timeout" you are talking about - is it measured from the last cold start or from the last function execution?
(i would prefer the latter)
It will be more aligned with the last execution - but it is actually going to be from the last time the scale controller votes to scale in. Any scale out or 'keep the same number of instances' reset the 10 minute counter.
Hi @nzthiago I'm experiencing the same delays on a service bus trigger even though its configured with 2 x always ready instances for the function. Any suggestions for host settings to get to work as expected with the warm instances?
We tested always on instance on empty / sample functions few months ago when @nzthiago suggested it as a potential workaround and it seemed to work, however I applied always on instance to our actual function app, it didn't seem to work either (same as @squiso). Not sure if this is a config difference between the two function apps or something has changed. Just an FYI
I'm also having the same issue as @squiso with the always ready instances not running when the settings @nzthiago suggested above are set. We're also seeing the function being tore down and new instances started approximately every 30 seconds when we're processing messages from the service bus queue, even when messages are being sent as frequently as every 100ms.
As it stands we can't use Flex Consumption within production because the ~10 seconds that it takes to switch over to the new instances is far too long a delay for our use case.
@paulparkermri makes me feel slightly better knowing it’s not just our azure functions configuration then. We’re also running in production which is causing delays on our platforms processing so looking to redeploy with the premium plan to workaround this issue. Annoyingly flex has been working fine all year otherwise.
Hi team - it seems like we caught a regression and there's an issue with case sensitivity when applying the always ready instance count for individual functions (like is the case for Service Bus). We are working to roll out a fix ASAP. The workaround if at all possible for your scenario meanwhile would be to update your function code to have the function name all lowercase and redeploy.
Hi @nzthiago thanks for the update and great you’ve identified the issue. We actually noticed the always ready instance name was overriding to lowercase but figured it must be case insensitive. We’ve switched our functions over to premium plan and now back to sub 30ms processing times for service bus.
Hi everyone, the fix for that casing difference has rolled out globally.
Just as an update - we are still working on the updates and rollout for the original concern. Due to priorities and some edge cases identified, this is now expected to roll out by end of 2025.
Hi @nzthiago and team, hope you are doing well. I am following up to see if you have an update for us, are we getting close to deploy these changes? Thanks :)
Hi @dsdilpreet (@alrod @pragnagopa FYI). We have rolled out one part of the solution that can help mitigate this, but there's a second one that is behind in the implementation and roll out before it fully addresses this. Is it possible for you to re-test and let us know if the issue still persists for you?
@nzthiago, yes happy to test. So there were two issues - function instance was scaling down too fast and service bus polling wasn't quick enough causing the message to stay in the subscription for a while before a new instance would start and process it. Would you be able to give a bit more details on the changes you have rolled out and what should be the expected behaviour now?
@dsdilpreet sorry I missed replying to this. What we have currently implemented is that instead of the last instance shutting down in ~30 seconds without activity, it will now remain active for 10 minutes, like Consumption did. There are further fixes we plan to help with this depending on when new executions happen.
Hi @nzthiago, all good!
We have a couple of workloads running on Flex now and can confirm that the function remains active for 10 minutes. When it does cold start, it takes about 10-15 seconds to process a message. I'm not sure if this delay is due to the function instance starting up or a delay from the Service Bus trigger itself keeping the message in queue. We’ve also seen a couple of outliers where it took up to 24 seconds. The HTTP cold start is much less.
Ideally if we can consistently reduce cold start times to 5 -10 seconds range, then we can migrate most of our workloads to Flex. For critical workloads, we will use always-ready instances.
I have a couple of questions: What is the expected cold start behaviour if there’s a constant stream of messages? For example, if there’s a message every 5 minutes, would every second message still experience a cold start, or does the 10 min timer resets with each message? You mentioned there are further fixes planned. What improvements can we expect from those?
Thanks! :)
Hi @dsdilpreet - will your workloads be mostly sporadic (i.e., not constant), that will allow for that last instance to scale to zero fully, is that correct? Are you able to share the name of your test function app?
Hi @nzthiago, we have some workloads with sustained traffic although they could become sporadic during certain parts of the day but most workloads are sporadic.
Is there a way to privately message you? I can share the name there. I have rolled out flex to our qa and prod environments for some workloads.
This is a small sample size but as a rough guide, I have attached csv log files from one of our site, this isnt all the traffic but just one site. First log file is with no always ready instance, second file is 1 always ready instance and third file is 2 always ready instances.
2 always ready instances does help a lot in this case but yeah the worst cold start is more than 20 seconds