logicapps icon indicating copy to clipboard operation
logicapps copied to clipboard

Logic App Standard - Slow Performance - Loops

Open JamieH-risual opened this issue 3 years ago • 4 comments

Creating a simple workflow which has actions to create an array of x items, loop through the items and add the current time into another array variable the performance seems extremely slow/limited. I’ve amended multiple host settings which are mentioned to increase throughput (Such as Jobs.BackgroundJobs.NumWorkersPerProcessorCount) and also concurrency settings with no significant impact. When setting the for each loop concurrency to its max(50 concurrent iterations) it completes the workflow in the following approximate times:

500 items: 30 seconds 1000 items: 2+ minutes 5000 items: 30 minutes

I’d expect all of these to be less than a minute with some of the lower end being within milliseconds. There does not appear to be an issue with the underlying compute as the % CPU and Memory do not exceed 70% and the plan never scales even under load which feels like there is either limitations of the product, configuration restricting throughput or a potential bottleneck somewhere such as the associated storage account.

I know that there is a SplitOn() feature which can be used to separate workflows for processing but I have a requirement to extract 100,000's of messages from a Service Bus and transform them into a single batch. I'm hoping to understand if there is an issue with configuration which can be updated to massively improve the throughput or whether the limitation is due to the product and if we need to switch to using Azure Functions.

An example workflow which can be used for testing is below:

{
    "definition": {
        "$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
        "actions": {
            "Compose": {
                "inputs": "@variables('testArray')",
                "runAfter": {
                    "For_each": [
                        "Succeeded"
                    ]
                },
                "type": "Compose"
            },
            "Execute_JavaScript_Code": {
                "inputs": {
                    "code": "return Array.from({length: 5000}, (x, i) => i);"
                },
                "runAfter": {
                    "Initialize_variable": [
                        "Succeeded"
                    ]
                },
                "type": "JavaScriptCode"
            },
            "For_each": {
                "actions": {
                    "Append_to_array_variable": {
                        "inputs": {
                            "name": "testArray",
                            "value": "@{items('For_each')}: @{utcNow()}"
                        },
                        "runAfter": {},
                        "type": "AppendToArrayVariable"
                    }
                },
                "foreach": "@outputs('Execute_JavaScript_Code')",
                "runAfter": {
                    "Execute_JavaScript_Code": [
                        "Succeeded"
                    ]
                },
                "type": "Foreach"
            },
            "Initialize_variable": {
                "inputs": {
                    "variables": [
                        {
                            "name": "testArray",
                            "type": "array"
                        }
                    ]
                },
                "runAfter": {},
                "type": "InitializeVariable"
            }
        },
        "contentVersion": "1.0.0.0",
        "outputs": {},
        "triggers": {
            "Recurrence": {
                "recurrence": {
                    "frequency": "Day",
                    "interval": 15
                },
                "type": "Recurrence"
            }
        }
    },
    "kind": "Stateful"
}

JamieH-risual avatar Mar 30 '22 09:03 JamieH-risual

Thank for the details @JamieH-risual . Optimizing foreach loop performance is something that we're actively working on. Would stateless workflows work for you scenario? They are typically much more performant than their stateful counterpart.

hongzli avatar Mar 31 '22 00:03 hongzli

Hi @hongzli, apologies I missed this response. Unfortunately not, the limitations are far too low for our real world situations. In one example we need to get 10000+ messages from a Service Bus Topic, transform them and send to a consumer as a single CSV. This means wrapping the 'Get messages from a topic subscription' action in a do loop until the body is empty, and inside that loop we have a for each which iterates the messages from the topic subscription where we pass the message to a transform json to json action and then add the outputs to an array. The reason I loop through individually rather than passing the full array into the liquid map is for resiliency. If a single item has bad data then I can handle the single failure but still send the rest of the messages to the consumer. Then the reason to append to an array is because we can only get 175 messages maximum from the Topic so we have to keep making trips to get the additional messages to transform and add to the array as our consumer is expecting the full batch. Stateless workflows limit us to 100 items in a for each and also 100 iterations in the do until meaning that we can only process 1000 items without having an overly complex workflow.

JamieH-risual avatar Apr 11 '22 16:04 JamieH-risual

Looking at your Logic App and the flow diagram; I want to emphasize that nested foreach/until loops won’t give the best performance results, as the scope is being persisted in storage on each iteration alongside the content of the loop, with variables and all.

You can further break down the main loop into calling another child Logic App with batching again, and/or you can use some inline code (JavaScript) which will give you much better performance. Inline code: Add and run code snippets by using inline code - Azure Logic Apps | Microsoft Docs

I understand this will endure development time for you and it’s harder to maintain a script than actions on the designer, but if foreach loops and until loops won’t have great performance with nesting for the time being. A lot of work is being done by the Product Group on the performance of loops, and on Stateless Logic Apps the PG are working to make it fully in-memory which will enhance performance drastically.

Instead of using Append to array variable, you can remove this loop and replace it with inline code, or with Parse JSON action /or Select Action to extract the array variables you want to use later in the code. Please have a look at the Parse and Select actions: Add and run code snippets by using inline code - Azure Logic Apps | Microsoft Docs

Another thing, if you are testing with debugging enabled it will degrade the performance as well.

The idea here is to eliminate as much nested loops as possible, and to use some inline code and built-in actions when possible. When that can’t be done, it is better to use nested Logic App.

The issue for the performance of the foreach loop has been discussed with the Product Group thoroughly in the past few months. The foreach loop is not intended to be used for data transformation or heavy loads, assigning variables in foreach loops can become a heavy scenario if the iterations are high, this is because the foreach persists the data information in storage and the round trip adds up to the total time of the process. A better solution is to use inline code for this purpose which will give you much better results.

Here is the input from the Product Group that I have gathered from multiple similar cases:

  1. The particular pattern referenced here, is not really intended for the scenario it is being used for. “Foreach” scope is intended for running scenarios that require durable pan-out and pan-in and not intended for data transformation that it is being used for. The recommended pattern for such in-memory data transformations is either to use inline JavaScript or other actions like query, Select and expressions. a. Here is an example of using Filter Array to reduce the number of loops prior to the foreach if this applies to your scenario (Use Logic Apps Filter Array to reduce actions in For-Each - techstuff (bergstrom.nu)) and (Perform operations on data - Azure Logic Apps | Microsoft Docs) b. Using Select Action prior to the foreach loop to eliminate assigning the variables inside the loop. Perform operations on data - Azure Logic Apps | Microsoft Docs c. Using inline code. Add and run code snippets by using inline code - Azure Logic Apps | Microsoft Docs and Getting the Latest Array Item with Inline Script in Logic App | DevKimchi (This article refers to Logic App Consumption that is why they use Integration Account, in Logic App Standard you don’t require an Integration Account - Single-tenant versus multi-tenant Azure Logic Apps - Azure Logic Apps | Microsoft Docs)
  2. That being said, improving performance in general is a priority for the Product Group and are working on number of areas to improving latency and throughput numbers in some of the scenarios – for example, they are actively investigating and optimizing some of the areas to improve performance related to stateless workflows in this semester and some of these performance improvements will be applicable to stateful workflows as well.
  3. The foreach loop with variable is slow by design because each time the array variable is updated there is round trip to storage and there is contention as well.
  4. Use of foreach loop with a variable assignment (Logic Apps foreach and variables - NETWORG Blog (thenetw.org) and Add loops to repeat actions - Azure Logic Apps | Microsoft Docs), could also lead into concurrency issues as multiple parallel threads try to update the variable in storage at the same time and we have to do backoff retry to resolve conflicts.
  5. In stateful mode, we have to checkpoint result of every iteration in storage in case the app crashes in the middle. This is what provides durability from transient errors and as a result, comes with some cost in latency. In stateless mode, we do optimize a lot which is why perceived performance is quite better. We noticed almost 20-50% less time in stateless mode compared to stateful, depending on the scenario.
  6. In stateful mode, scoped actions taking multiple seconds to complete is by design, given the way runtime aggregates data. In stateless mode, it should be better. You can test using Stateless mode and see if the performance is better. Please note the differences between stateful and stateless Logic Apps. Single-tenant versus multi-tenant Azure Logic Apps - Azure Logic Apps | Microsoft Docs
  7. Another way around this is to use a slightly different pattern, where you are debatching to another Logic App. For example you'll have your one logic app that generates the array that you want to loop through, then create another logic app that accepts an HTTP request as a trigger. On the second logic app under settings enable the SplitOn function. Now put your 'work' in the second logic app, then go back to the first and make the last step to send the array to the second logic app. When using this pattern the first logic app will initialize a run for each item in the array, they will all run concurrently, rather than limited to the concurrency of the for each loop. Here is some more detail in the documentation about this function. Schema reference for trigger and action types - Azure Logic Apps | Microsoft Docs

In reference to the above, if you are assigning a variable inside a foreach and the concurrency is not set to 1, this will give you unexpected results as the foreach will run asynchronously and not sequentially, so the data will be inconsistent.

For the, calling the Liquid Map from within an inline code, it can’t be done directly by calling an action, but you can place that map inside another Logic App and call this Logic App from the inline code, even inside the for-each loop in that script. Same as you call HTTPS with JavaScript. That being said, I see that using a mix of Logic Apps and Function Apps might be the best solution for your issue, you can keep the main structure of your Logic Apps intact while calling a Function App to do the “For-each” work that is not giving you the performance thresholds you are seeking.

For batching we have an action that you can use, please have a look at the official documentation for it below, along with some external links for examples:

Batch process messages as a group - Azure Logic Apps | Microsoft Docs How to debatch and batch in Logic Apps (aims.ai) Logic Apps Debatching | Codit Logic Apps Batching | Codit Batching Inside Logic App – Tutorial (alliedc.com)

Please find my notes below in highlight:

In regard to using inline code can you advise based on our real example how this would work with the additional actions required? • We have a requirement to get all of the messages from a Service Bus Topic. Without having this as a trigger how can we achieve this without a do until? If the do-until is affecting the performance, maybe get all the messages using the trigger, persist them somewhere, another SB, EventHub, SQL, a JSON array, etc. Then pass this to another Logic App. • We currently use the liquid transformation action to transform an individual message from the canonical format to our consumer format. Admittedly we could re-write our map file to handle an array of messages rather than single items however that impacts our resiliency as if a single item has bad data and fails the transformation then the whole job fails whereas currently if there are 10 failures but 990 successful transformations then we can still write a file to the consumer and worry about the individual failures separately. If we were to use inline code to iterate the array of messages I don’t believe we can call the liquid transformation action from the inline code action can we? Up to my knowledge, no you can’t call Liquid Map from within inline code. Answered above • Additionally with inline code I don’t believe we can use the action to append to an existing array so are you able to expand on your suggestion? In inline code you can pass an array to the code and append data there. Add and run code snippets by using inline code – Azure Logic Apps | Microsoft Docs Schema reference for trigger and action types – Azure Logic Apps | Microsoft Docs

To summarize the requirements of the Logic App: • Service Bus will contain X messages (Let’s say 10,000) • Each message requires transforming using a liquid map and produces an array of messages (Likely between 1 & 10) • This array of messages needs to be joined to all of the other arrays of messages for each message in the Service Bus • We then convert the final array of messages to a CSV and send to the consumer system. The system expects a full batch in a single file.

OmarAbuArisheh avatar Jun 02 '22 10:06 OmarAbuArisheh

Some great advice here. I had a workflow with nested foreach loops, it was just adding strings to an array variable and doing string concatenation. I had just 1 iteration of the outer foreach, then 100 iterations of the inner foreach. It took 5 minutes to run the workflow. So I replaced both of the foreach actions with Inline Javascript ones. Now my workflow takes 3 seconds. :-)

asmason avatar Jun 09 '22 12:06 asmason

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Oct 20 '22 21:10 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Oct 27 '22 22:10 github-actions[bot]