azure-functions-host icon indicating copy to clipboard operation
azure-functions-host copied to clipboard

TimerTrigger in Azure Function in Docker container on Kubernetes runs on every instance (.NET Core)

Open MrImpossibru opened this issue 4 years ago • 27 comments

TimerTrigger to run only once for all instances makes a lease in a BlobStorage. Name of the container depends on a Machine name in case of Kubernetes and it's unique for all instances. Because of that TimerTrigger runs on all instances at the same time.

As temporary solution I have to use env. variables WEBSITE_SITE_NAME and WEBSITE_INSTANCE_ID to make it work as expected, but maybe it's better to provide a way to override default container name for Azure Functions.

MrImpossibru avatar May 07 '20 10:05 MrImpossibru

@divyagandhii - I feel like we faced a similar issue with Linux. If so, mind weighing in?

mhoeger avatar May 07 '20 17:05 mhoeger

@mhoeger - you are right. I fixed this for linux consumption here: https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L53

We might have to do something similar for Kubernetes as well.

divyagandhisethi avatar May 11 '20 16:05 divyagandhisethi

cc @SatishRanjan

divyagandhisethi avatar May 11 '20 16:05 divyagandhisethi

cc @anirudhgarg

divyagandhisethi avatar May 11 '20 22:05 divyagandhisethi

@divyagandhii - do you know what the status of this bug is?

mhoeger avatar Jul 20 '20 19:07 mhoeger

I am also still seeing this on the .NET 6 isolated runtime Docker image. I see that this has been open for two years, are there any updates? This is a blocking issue that prevents my team from properly scaling our timer-triggered functions since in order to prevent duplicate runs, we can only run one replica at a time…

esimkowitz avatar Mar 01 '22 21:03 esimkowitz

Assigning to @JatinSanghvi to have a look,

anirudhgarg avatar Mar 01 '22 21:03 anirudhgarg

@esimkowitz Last time I had this issue I refactored my service to use WebJobs SDK instead of Azure Functions. It uses twice less memory and works more predictable in a container in Kubernetes.

MrImpossibru avatar Mar 02 '22 07:03 MrImpossibru

Also to clarify, we use Kubernetes to host our Azure Functions so when I say instance/replica of the function app, I am referring to separate pods running the same function app container with the same storage account.

@MasterKolka unfortunately I don't think WebJobs is an option for us since we use the other triggering mechanisms of Azure Functions pretty heavily.

esimkowitz avatar Mar 02 '22 20:03 esimkowitz

@esimkowitz As I remember the main problem is to give a different "instance name" for each pod (which is required to prevent multiple executions by different instances). With a webjobs sdk this could be solved in some way. And WebJobs SDK supports all Azure Functions triggers except http (which could be solved by using ASP Controllers). Azure Functions SDK is based on WebJobs SDK.

MrImpossibru avatar Mar 03 '22 07:03 MrImpossibru

@lpapudippu Any updates on the investigation here?

esimkowitz avatar Mar 10 '22 01:03 esimkowitz

@v-shenoy Please look in to it. cc: @raorugan

lpapudippu avatar Mar 10 '22 06:03 lpapudippu

Taking a look into this.

v-shenoy avatar Mar 10 '22 07:03 v-shenoy

This problem has had multiple issues created for it. Here's one by @TsuyoshiUshio. The work around he mentions (and I tested that it works) is to manually set the AzureFunctionsWebHost__hostid environment variable on the containers.

https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L66

As Divya mentions, I could make changes here to include

else if (environment.IsContainer())
{
    hostId = <set-host-id-to-something>;
}

However, I am not sure what to use as hostId that will be same for all the containers running the image.

v-shenoy avatar Mar 11 '22 09:03 v-shenoy

cc @pragnagopa .

@v-shenoy The host id needs to be something unique per pod instance. For example - you can use the pod name or pod id. In other Linux skus, we use the container name.

divyagandhisethi avatar Mar 11 '22 17:03 divyagandhisethi

This problem has had multiple issues created for it. Here's one by @TsuyoshiUshio. The work around he mentions (and I tested that it works) is to manually set the AzureFunctionsWebHost__hostid environment variable on the containers.

https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L66

As Divya mentions, I could make changes here to include

else if (environment.IsContainer())
{
    hostId = <set-host-id-to-something>;
}

However, I am not sure what to use as hostId that will be same for all the containers running the image.

Also, just checking environment.IsContainer() won't be sufficient. You need to explicitly check if we are in a managed kubernetes environment. There are helper functions already implemented for that.

divyagandhisethi avatar Mar 11 '22 17:03 divyagandhisethi

This problem has had multiple issues created for it. Here's one by @TsuyoshiUshio. The work around he mentions (and I tested that it works) is to manually set the AzureFunctionsWebHost__hostid environment variable on the containers. https://github.com/Azure/azure-functions-host/blob/d571d25276c72778339c78d48c14c3c680d52b82/src/WebJobs.Script/Host/ScriptHostIdProvider.cs#L49-L66

As Divya mentions, I could make changes here to include

else if (environment.IsContainer())
{
    hostId = <set-host-id-to-something>;
}

However, I am not sure what to use as hostId that will be same for all the containers running the image.

Also, just checking environment.IsContainer() won't be sufficient. You need to explicitly check if we are in a managed kubernetes environment. There are helper functions already implemented for that.

It doesn't have to be managed kubernetes, right? The issue still happens when deploying to a kubernetes cluster directly, using func kubernetes deploy

v-shenoy avatar Mar 12 '22 03:03 v-shenoy

cc @pragnagopa .

@v-shenoy The host id needs to be something unique per pod instance. For example - you can use the pod name or pod id. In other Linux skus, we use the container name.

@divyagandhisethi It already uses pod name attached, with a hash.

image image

Also, it shouldn't be unique per pod instance, right? There could be multiple pods running the same function image as @esimkowitz does (and also shown in the image). You want all of these pods to get the same host id for timer trigger.

v-shenoy avatar Mar 12 '22 03:03 v-shenoy

I agree with Vighnesh. The issue is caused because the function apps running inside the container are using machine name as host ID, which is unique for each pod. We are not looking for unique ID per pod here, but an ID that's common across all pod instances for the same function app image, but distinct from the ID used by pods belonging to a different function app image.

I am inclined to use hash generated from container image path (sans the tag name). This ensures that if container code is upgraded to point to new tag of the same container image, the newly created pods continue from the same checkpoint where previous pods had left. This is as not as critical for timer-trigger as it might be for triggers for tabular event sources.

It should probably also work if we use assembly name containing the functions as host ID as being done in DefaultHostIdProvider.

Feel free to add comment if you see any issues with using assembly name, or know of a better method for host ID computation.

JatinSanghvi avatar Mar 12 '22 15:03 JatinSanghvi

I have incorporated @v-shenoy's workaround of setting the hostId environment variable and have confirmed that it unblocks my scenario (in our case we deploy one function per namespace and the namespace names are consistent across clusters). Looking forward to your permanent solution!

esimkowitz avatar Mar 15 '22 22:03 esimkowitz

A permanent solution that requires change in host ID generation logic, will also be a change with potential for trigger misses. There will be users who are having just a single container up and running. If we change the host ID generation logic and user updates their function host version, their next app revision will start from a clean state instead of continuing from the previous state.

JatinSanghvi avatar Mar 16 '22 11:03 JatinSanghvi

On second thought, every new container should already be getting assigned a separate host name which does not get maintained between deployments. So we will not be breaking an existing working functionality in that sense.

JatinSanghvi avatar Mar 16 '22 12:03 JatinSanghvi

Maybe it would be useful to just document the host ID behavior publicly and let service devs pick a value they know will be consistent for their applications?

esimkowitz avatar Mar 16 '22 20:03 esimkowitz

Hi @esimkowitz, I am personally leaning towards just having a proper documentation in place. I did not find an environment variable inside running function app container that meets the requirement to be used as host ID. We thought of using 'container image path' but I don't think that information is available to be inserted automatically while composing Dockerfile, building the docker image or running the container. It will only need to be passed by user at some point, which is no less manual than setting hostid environment variable.

We also thought of having a logic in functions host similar to DefaultHostIdProvider but @v-shenoy pointed out that it won't work for non-.NET function apps.

JatinSanghvi avatar Mar 17 '22 02:03 JatinSanghvi

@divyagandhisethi

v-shenoy avatar Mar 21 '22 05:03 v-shenoy

Tagging @fabiocav for his input

pragnagopa avatar Mar 23 '22 17:03 pragnagopa

So are we supposed to set hostid to something static when running in Kubernetes?

If I have a Deployment containing a function app and it is scaled (by Keda, or manually) to 2 instances, should the hostid be the same for both replicas of the application?

I've tried doing this and it seems to work fine, but it's really unclear from the documentation and the thread above. I'll note that when doing this, there will be one host file created in blob storage, and the metadata on that file will contain a FunctionInstance property, which I suppose is doubly confusing since there's two instances/replicas using this hostid. I did notice that if you scale the application to 0 and then back up again, the FunctionInstance metadata will be updated to reflect some new instance.

I'm not sure if this is an indicator of the fact that it's the wrong thing to be doing. Can anyone clarify the proper thing to do?

pinkfloydx33 avatar Apr 21 '24 14:04 pinkfloydx33