azure-functions-python-worker
azure-functions-python-worker copied to clipboard
Azure Blob trigger function AKS trigger issue
I have a blob-triggered Azure function, which is deployed on AKS with keda scaling based on the blob entries. I used Azure/azure-functions-host#10624 to make each function accept only one blob item. The problem I have is that all created pods read the same file, but if I use queues-based triggers and scaling, different queue elements are read by different functions. According to my understanding, the blob trigger internally uses queues to do its tasks, so why is the behaviour different from having a blob trigger?
P.S.: I am moving the files to a different folder after the process is completed.
host.json
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensions": {
"blobs": {
"maxDegreeOfParallelism": 1
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[4.*, 5.0.0)"
}
}
host: 2.0
"FUNCTIONS_WORKER_RUNTIME": "python",
"AzureWebJobsFeatureFlags": "EnableWorkerIndex",
"PYTHON_ISOLATE_WORKER_DEPENDENCIES": "1",
from blob_helper import initialize_blob_service_client,upload_dataframe_to_blob
import logging
app = func.FunctionApp(http_auth_level=func.AuthLevel.ANONYMOUS)
@app.function_name(name="PythonFunction")
@app.blob_trigger(
arg_name="myblob",
path="sheets/input/{name}", # Blob path for trigger
connection="DataLakeConnectionString"
)
keda
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: python-fuction-scaler
namespace: prod
spec:
scaleTargetRef:
name: python-fuction
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: azure-blob
metadata:
blobContainerName: "sheets"
blobPrefix: "input"
connectionFromEnv: "DataLakeConnectionString"
targetBlobCount: "1"
authenticationRef:
name: secrets
What I was able to find out was that each instance of the Azure Blob triggered function was creating a new queue and setting a lock in that, so is it possible to have a common queue for all? I think with that my issue should be solved.
HEllo @anime-shed thankyou for sharing your findings, as far as i worked with it is not possible to have common queue in that scenerio. but please validate this document and let me know if it help you - https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-trigger?tabs=python-v2%2Cisolated-process%2Cnodejs-v4%2Cextensionv5&pivots=programming-language-csharp
@JAdluri, from what I am able to get from the documentation you attached, I can provide the
Queue Service URI (required for blob triggers2) <CONNECTION_NAME_PREFIX>__queueServiceUri
but that will not help me with the issue I am facing.
I want my blob-triggered function to know that a particular blob is already being read in another pod so it can read the next message in the blob.
Eg: Current behaviour: Blob name: Input_Files Blob1, Blob2, Blob3, Blob4..
AKS pods: Pod 1: triggers read Blob1 and creale Input_Files_12342 and locks the element in that queue Pod 2: triggers read Blob1 and creale Input_Files_34342 and locks the element in that queue
Expected behaviour: Blob name: Input_Files Blob1, Blob2, Blob3, Blob4..
AKS pods: Pod 1: triggers read Blob1 and creale Input_Files_12342 and locks the element in that queue Pod 2: triggers read Blob1 find it already read, so read Blob2 and add it in Input_Files_34342 and locks the element in that queue