azure-functions-host
azure-functions-host copied to clipboard
TimerTrigger in Azure Function in Docker container on Kubernetes runs on every instance (.NET Core)
TimerTrigger to run only once for all instances makes a lease in a BlobStorage. Name of the container depends on a Machine name in case of Kubernetes and it's unique for all instances. Because of that TimerTrigger runs on all instances at the same time.
As temporary solution I have to use env. variables WEBSITE_SITE_NAME and WEBSITE_INSTANCE_ID to make it work as expected, but maybe it's better to provide a way to override default container name for Azure Functions.
@divyagandhii - I feel like we faced a similar issue with Linux. If so, mind weighing in?
@mhoeger - you are right. I fixed this for linux consumption here: https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L53
We might have to do something similar for Kubernetes as well.
cc @SatishRanjan
cc @anirudhgarg
@divyagandhii - do you know what the status of this bug is?
I am also still seeing this on the .NET 6 isolated runtime Docker image. I see that this has been open for two years, are there any updates? This is a blocking issue that prevents my team from properly scaling our timer-triggered functions since in order to prevent duplicate runs, we can only run one replica at a time…
Assigning to @JatinSanghvi to have a look,
@esimkowitz Last time I had this issue I refactored my service to use WebJobs SDK instead of Azure Functions. It uses twice less memory and works more predictable in a container in Kubernetes.
Also to clarify, we use Kubernetes to host our Azure Functions so when I say instance/replica of the function app, I am referring to separate pods running the same function app container with the same storage account.
@MasterKolka unfortunately I don't think WebJobs is an option for us since we use the other triggering mechanisms of Azure Functions pretty heavily.
@esimkowitz As I remember the main problem is to give a different "instance name" for each pod (which is required to prevent multiple executions by different instances). With a webjobs sdk this could be solved in some way. And WebJobs SDK supports all Azure Functions triggers except http (which could be solved by using ASP Controllers). Azure Functions SDK is based on WebJobs SDK.
@lpapudippu Any updates on the investigation here?
@v-shenoy Please look in to it. cc: @raorugan
Taking a look into this.
This problem has had multiple issues created for it. Here's one by @TsuyoshiUshio. The work around he mentions (and I tested that it works) is to manually set the AzureFunctionsWebHost__hostid
environment variable on the containers.
https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L66
As Divya mentions, I could make changes here to include
else if (environment.IsContainer())
{
hostId = <set-host-id-to-something>;
}
However, I am not sure what to use as hostId
that will be same for all the containers running the image.
cc @pragnagopa .
@v-shenoy The host id needs to be something unique per pod instance. For example - you can use the pod name or pod id. In other Linux skus, we use the container name.
This problem has had multiple issues created for it. Here's one by @TsuyoshiUshio. The work around he mentions (and I tested that it works) is to manually set the
AzureFunctionsWebHost__hostid
environment variable on the containers.https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L66
As Divya mentions, I could make changes here to include
else if (environment.IsContainer()) { hostId = <set-host-id-to-something>; }
However, I am not sure what to use as
hostId
that will be same for all the containers running the image.
Also, just checking environment.IsContainer() won't be sufficient. You need to explicitly check if we are in a managed kubernetes environment. There are helper functions already implemented for that.
This problem has had multiple issues created for it. Here's one by @TsuyoshiUshio. The work around he mentions (and I tested that it works) is to manually set the
AzureFunctionsWebHost__hostid
environment variable on the containers. https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L66As Divya mentions, I could make changes here to include
else if (environment.IsContainer()) { hostId = <set-host-id-to-something>; }
However, I am not sure what to use as
hostId
that will be same for all the containers running the image.Also, just checking environment.IsContainer() won't be sufficient. You need to explicitly check if we are in a managed kubernetes environment. There are helper functions already implemented for that.
It doesn't have to be managed kubernetes, right? The issue still happens when deploying to a kubernetes cluster directly, using func kubernetes deploy
cc @pragnagopa .
@v-shenoy The host id needs to be something unique per pod instance. For example - you can use the pod name or pod id. In other Linux skus, we use the container name.
@divyagandhisethi It already uses pod name attached, with a hash.
Also, it shouldn't be unique per pod instance, right? There could be multiple pods running the same function image as @esimkowitz does (and also shown in the image). You want all of these pods to get the same host id for timer trigger.
I agree with Vighnesh. The issue is caused because the function apps running inside the container are using machine name as host ID, which is unique for each pod. We are not looking for unique ID per pod here, but an ID that's common across all pod instances for the same function app image, but distinct from the ID used by pods belonging to a different function app image.
I am inclined to use hash generated from container image path (sans the tag name). This ensures that if container code is upgraded to point to new tag of the same container image, the newly created pods continue from the same checkpoint where previous pods had left. This is as not as critical for timer-trigger as it might be for triggers for tabular event sources.
It should probably also work if we use assembly name containing the functions as host ID as being done in DefaultHostIdProvider.
Feel free to add comment if you see any issues with using assembly name, or know of a better method for host ID computation.
I have incorporated @v-shenoy's workaround of setting the hostId environment variable and have confirmed that it unblocks my scenario (in our case we deploy one function per namespace and the namespace names are consistent across clusters). Looking forward to your permanent solution!
A permanent solution that requires change in host ID generation logic, will also be a change with potential for trigger misses. There will be users who are having just a single container up and running. If we change the host ID generation logic and user updates their function host version, their next app revision will start from a clean state instead of continuing from the previous state.
On second thought, every new container should already be getting assigned a separate host name which does not get maintained between deployments. So we will not be breaking an existing working functionality in that sense.
Maybe it would be useful to just document the host ID behavior publicly and let service devs pick a value they know will be consistent for their applications?
Hi @esimkowitz, I am personally leaning towards just having a proper documentation in place. I did not find an environment variable inside running function app container that meets the requirement to be used as host ID. We thought of using 'container image path' but I don't think that information is available to be inserted automatically while composing Dockerfile, building the docker image or running the container. It will only need to be passed by user at some point, which is no less manual than setting hostid
environment variable.
We also thought of having a logic in functions host similar to DefaultHostIdProvider but @v-shenoy pointed out that it won't work for non-.NET function apps.
@divyagandhisethi
Tagging @fabiocav for his input
So are we supposed to set hostid
to something static when running in Kubernetes?
If I have a Deployment
containing a function app and it is scaled (by Keda, or manually) to 2 instances, should the hostid
be the same for both replicas of the application?
I've tried doing this and it seems to work fine, but it's really unclear from the documentation and the thread above. I'll note that when doing this, there will be one host
file created in blob storage, and the metadata on that file will contain a FunctionInstance
property, which I suppose is doubly confusing since there's two instances/replicas using this hostid
. I did notice that if you scale the application to 0
and then back up again, the FunctionInstance
metadata will be updated to reflect some new instance.
I'm not sure if this is an indicator of the fact that it's the wrong thing to be doing. Can anyone clarify the proper thing to do?