specification icon indicating copy to clipboard operation
specification copied to clipboard

While, Do-while loop implementation for fetching paginated data from an API

Open nyamathshaik opened this issue 8 months ago • 50 comments

What would you like to be added?

Support for while or do-while loop constructs to simplify workflows that need to fetch paginated data from APIs until a condition is met (e.g., no more pages to fetch).

Proposal(s):

Currently, implementing workflows that require repetitive API calls until a condition is met (e.g., while nextPageToken is present) is cumbersome and verbose using the existing specification. There is no native support for loop constructs that check a condition after each iteration (i.e., a do-while pattern).

Proposal:

  • Introduce a native while or do-while loop construct within the Serverless Workflow specification.
  • Support a conditional expression (JQ) that evaluates whether the loop should continue.
  • Allow a loopBody or array of steps to be executed within the loop.
  • Optional: support max iterations to prevent infinite loops.

Example syntax (pseudo-DSL):

document:
  dsl: '1.0.0'
  namespace: test
  name: paginated-fetch-example
  version: '0.1.0'

do:
  - fetchPaginatedData:
      while: .hasNextPage == true
      postConditionCheck: false // Enables do-while 
      maxIterations: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

Alternative(s):

Currently, the same can be achieved by:

  • Manually chaining states with transition and using a switch to re-enter a state based on a condition.
  • Using recursion with workflow calls, which increases complexity and reduces readability.
  • Implementing a custom function or orchestrator outside of the workflow itself.

These approaches are harder to read, maintain, and error-prone when retry logic and timeouts are introduced.

Additional info:

This feature would greatly improve:

  • Workflows that deal with pagination (API data fetching, batch processing).
  • Polling use cases where the data becomes available over time.
  • Any retry-until-success or loop-until-condition patterns.

Let me know if you'd like me to open a PR to help explore this idea! Or do you think this can be achieved with current For loop? @ricardozanini @cdavernas

Community Notes

  • Please vote by adding a 👍 reaction to the feature to help us prioritize.
  • If you are interested to work on this feature, please leave a comment.

nyamathshaik avatar Apr 13 '25 16:04 nyamathshaik

However, It appears that the current for task can be utilized as a loop, allowing a For task to operate as a traditional while loop in Serverless Workflow.

By setting the for.in property of a For task to an empty collection, the while property takes over as the primary control mechanism for the loop, facilitating the use of both while and do-while loop patterns within Serverless Workflows.

Example:

 - name: WhileLoopExample
    type: while
    for: 
      in: ${ [] }
    while: ${ .counter < 5 }
    maxIterations: 100
    do:
      - name: IncrementCounter
        type: set
        data:
          counter: ${ .counter + 1 }
      - name: LogMessage
        type: run
        run:
          script:
            lang: javascript
            code: |
              console.log("Counter is: " + context.counter);

Example:

- name: DoWhileLoopExample
   type: do-while
   for: 
     in: ${ [] }
   while: ${ .counter < 5 } # false initially
   maxIterations: 100
   do:
     - name: IncrementCounter
       type: set
       data:
         counter: ${ .counter + 1 }
     - name: LogMessage
       type: run
       run:
         script:
           lang: javascript
           code: |
             console.log("This runs even though .counter < 5 is false!");

nyamathshaik avatar Apr 14 '25 04:04 nyamathshaik

May I know if for is mandatory or can we get it of

for: 
     in: ${ [] }

According to typescript SDK it's optional, but its no-where that's mentioned in the specs, can you clarify please?

https://github.com/serverlessworkflow/sdk-typescript/blob/main/src/lib/generated/definitions/specification.ts#L252

nyamathshaik avatar Apr 14 '25 06:04 nyamathshaik

@nyamathshaik Yes, for is mandatory, and also serves as a discriminator. In addition, your suggestion would not work because the enumeration would stop before evaluating the while condition, as the array is empty. while in that context is an exit condition, not a continuation one, in which case it would create many unpleasant side effects (one being a null ref for enumerated item).

Therefore, you initial proposal looks to me to be the most adequate, and would also restore a construct we had in previous versions.

cdavernas avatar Apr 14 '25 06:04 cdavernas

@cdavernas Thank you for the feedback. I was considering something along the lines of the following, but I am still open to suggestions. Please review it and let me know if it looks good.

export type LoopTask = TaskBase &{
    /**
     * The type of loop to use.
     */
    loop?: 'for' | 'while' | 'do-while';
    /**
     * The configuration for the for loop.
     */
    for?: ForTaskConfiguration;
    /**
     * The configuration or conditions for the while loop.
     */
    while?: WhileTaskConfiguration | string;
    /**
     * The tasks to execute if the loop is a for, while, do-while loop.
     */
    do?: TaskList;
   
    
    [k: string]: unknown;
};
export interface WhileTaskConfiguration {
    /**
     * The name of the variable used to store the current item being enumerated.
     */
    condition?: string;
    /**
     * A runtime expression used to get the collection to enumerate.
     */
    maxIterations?: number;
    /**
     * The name of the variable used to store the index of the current item being enumerated.
     */
    at?: string;
}

Benefits of Proposed Approach

  • Backward Compatibility: Existing ForTask implementation remains unchanged.
  • Enhanced Clarity: Clear differentiation between for, while, and do-while loops.
  • Safety Mechanism: The addition of maxIterations prevents infinite loops.
  • Extensibility: Future improvements can be made without affecting existing task definitions.

This proposed solution provides a structured approach to implementing loop tasks while maintaining compatibility with Serverless Workflow specifications.

nyamathshaik avatar Apr 14 '25 06:04 nyamathshaik

@nyamathshaik First of all, thanks for your awesome work!

Second, I have a couple of remarks regarding the proposal you made in related PR:

do:
  - fetchPaginatedData:
      while: .hasNextPage == true
      postConditionCheck: false # Enables do-while behavior
      maxIterations: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults
  • The while keyword will need to change, either in the task you propose to add, or in the for task, or discrimination will no longer be (easily) possible (i.e. check for presence of while keyword, which as a primary identifier must not be use by any other task)
  • I'm personnaly not a fan of the postConditionCheck and maxIterations terminology, which goes against the design guidelines. I'm convinced we can come up with imperative, actionable, single words names instead!

cdavernas avatar Apr 14 '25 10:04 cdavernas

@cdavernas Thanks a lot for the feedback and kind words! 🙌 Really appreciate you taking the time to review the proposal.

Totally hear you on the concerns around the while keyword clash and the naming of postConditionCheck and maxIterations.

For the keyword conflict: I’m happy to revise the structure to avoid any ambiguity with for tasks. Would something like below look good?

do:
  - fetchPaginatedData:
      loop: while | do-while
      condition: .hasNextPage == true
      limit: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

Open to suggestions! I'm happy to iterate further on the proposal based on whatever naming conventions or structural guidance the maintainers feel would fit best.

Looking forward to your thoughts!

nyamathshaik avatar Apr 14 '25 11:04 nyamathshaik

or how about just simply renaming the initial proposal to below :

do:
  - fetchPaginatedData:
      condition: .hasNextPage == true
      isDoWhile: true # Enables do-while behavior
      limit: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

nyamathshaik avatar Apr 14 '25 11:04 nyamathshaik

Why not:

 - name: WhileLoopExample
   while: ${ .counter < 5 }
   do:
      - name: IncrementCounter
        type: set
        data:
          counter: ${ .counter + 1 }
      - name: LogMessage
        type: run
        run:
          script:
            lang: javascript
            code: |
              console.log("Counter is: " + context.counter);

And

 - name: WhileLoopExample
   do:
      - name: IncrementCounter
        type: set
        data:
          counter: ${ .counter + 1 }
      - name: LogMessage
        type: run
        run:
          script:
            lang: javascript
            code: |
              console.log("Counter is: " + context.counter);
    while: ${ .counter < 5 }

The challenge is to interpret this in the JSON Schema.

Essentially, you want a for without in, correct?

ricardozanini avatar Apr 14 '25 17:04 ricardozanini

We can add the limit keyword, no problem. But yes, we must avoid camel-cased and composition words.

ricardozanini avatar Apr 14 '25 17:04 ricardozanini

The first is a do task with an optional while parameter. The other is a while task. I think we can accommodate this in the JSON schema with backwards comp for 1.1.0.

ricardozanini avatar Apr 14 '25 17:04 ricardozanini

Thanks for reviewing @ricardozanini JSON Schema doesn’t support positional logic. So I don't think comment actually works. (i.e., if while appears before, interpret it one way; if after, interpret differently).

Also, this creates more Ambiguity in Semantics (while appearing before or after do: changes the behavior).

nyamathshaik avatar Apr 14 '25 19:04 nyamathshaik

That said, I am still aligned with below :

Approach 1:

do:
  - fetchPaginatedData:
      loop: while | do-while
      condition: .hasNextPage == true
      limit: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

Approach 2:

do:
  - fetchPaginatedData:
      condition: .hasNextPage == true
      isDoWhile: true # Enables do-while behavior
      limit: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

Please let me know if you have any further suggestions, or an entire new suggestion is fine too. So that we can agree upon the final solution and I can work on raising a PR.

@ricardozanini @cdavernas

nyamathshaik avatar Apr 14 '25 19:04 nyamathshaik

@nyamathshaik Am I missing something or approach 1 and 2 are the exact same in above comment?

Otherwise, IMHO the last suggestions are far better than the initial one, on a semantic point of view. I'm not a big fan of the loop property though, but it might be the sole elegant choice we have, given I'm failing to find alternatives to the while of the for task.

cdavernas avatar Apr 14 '25 20:04 cdavernas

@cdavernas @nyamathshaik Golang doesn't have while and supports both scenarios. Why not just remove the in?

https://www.programiz.com/golang/while-loop https://yourbasic.org/golang/do-while-loop/

I'm not a fan of this loop attribute. Let's stay on one task to perform iterations only. Removing the in requirement from for implies a while condition. We can make while required when in is blank.

ricardozanini avatar Apr 14 '25 21:04 ricardozanini

@ricardozanini The problem is that it would mean to also remove the for.each, for.in and for.at, all useless in the context of a while task, therefore de facto removing the for keyword, thus destroying the actual construct.

To conclude, I think @nyamathshaik is right to (re)introduce (we had that before 1.x.x) a new dedicated task. Motivations are:

  • Even if in Golang there's no while loop, almost all other language have it, and it will IMHO feel more convenient to most of them
  • Less, if not no collateral damage: nothing is broken, everything remains backward compatible, as we are just adding functionality
  • It is semantically cleaner IMHO

cdavernas avatar Apr 14 '25 22:04 cdavernas

Thanks alot for taking time and reviewing this request @ricardozanini @cdavernas

I agree with @cdavernas Removing the in requirement from for might destroy the actual construct.

That said, we have below 3 approaches to finalize from. Please help confirm.

Approach 1:

do:
  - fetchPaginatedData:
      loop: while | do-while
      condition: .hasNextPage == true
      limit: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

Approach 2:

do:
  - fetchPaginatedData:
      condition: .hasNextPage == true
      isDoWhile: true # Enables do-while behavior
      limit: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

As per @ricardozanini suggestion

Approach 3:

do:
  - fetchPaginatedData:
      while: .hasNextPage == true // using the current while property from `for` task
      isDoWhile: true # Enables do-while behavior
      limit: 100
      do:
        - fetchPage:
            call: getPageData
            input:
              pageToken: .nextPageToken
            output: .fetchedData
        - accumulateData:
            run: mergeResults
            input:
              newData: .fetchedData
              existingData: .accumulatedResults
            output: .accumulatedResults

nyamathshaik avatar Apr 15 '25 05:04 nyamathshaik

I think Zanini proposal is the most intuitive one. To surpass the json schema challenge, we can replace the while: by repeatWhile:, so they look this way

Standard while task

- WhileLoopExample
     while: ${ .counter < 5 }
     do:
        - name: IncrementCounter
          type: set
          data:
            counter: ${ .counter + 1 }
        - name: LogMessage
          type: run
          run:
            script:
              lang: javascript
              code: console.log("Counter is: " + context.counter);

Do while

 - DoWhileExample
     do:
        - name: IncrementCounter
          type: set
          data:
            counter: ${ .counter + 1 }
        - name: LogMessage
          type: run
          run:
            script:
              lang: javascript
              code: console.log("Counter is: " + context.counter);
     repeatWhile: ${ .counter < 5} 

fjtirado avatar Apr 15 '25 10:04 fjtirado

@fjtirado As explained above, @ricardozanini proposal is not doable, as it would either force the removal of the for task, or collide with it

cdavernas avatar Apr 15 '25 11:04 cdavernas

I do not think so, I bet you can have a while task and a for task with a while. What you cannot have is two whiles at the same level in different possition, thats why I replaced one of Ricardos while by a repeatWhile. This is a tricky one ;)

fjtirado avatar Apr 15 '25 11:04 fjtirado

Not really: in one case for is required bu while is optional, in the other while is required, but for is excluded. In other words, it de facto becomes a oneOf of... two different tasks. Therefore, I advocate for turning it into what it is, another task. Also, it's confusing on documentation's side, where a for task would also be a non-for task (I.e. while). To conclude, I feel the shortcut you propose actually complexifies things and breaks backward compatibility, just for the sake of not adding a new task type.

cdavernas avatar Apr 15 '25 11:04 cdavernas

@fjtirado If we want to do @ricardozanini approach, we will have to use the existing while property inside for task. Currently, for and for.in are mandatory, hence we need put while under for task.

However, @ricardozanini also suggested if we can just make for.in as optional as while as mandatory if for.in is blank . But that way, it would mean we removing the for keyword entirely as for.each, for.in, for.at are all useless, thus destroying the actual construct and breaking the backward compatibility.

Adding a new task type is the easiest and safest route we can take IMHO also.

nyamathshaik avatar Apr 15 '25 11:04 nyamathshaik

Ok, let me rephrase, what I suggested (please take a closer look to the example I use) is a new while task for regular while. And to emulate do-while, add a new optional property (repeatWhile) to do list so you can repeat the do. The for task remains unchanged and we do not need a boolean flag for the new while to emulate the do-while

fjtirado avatar Apr 15 '25 13:04 fjtirado

@fjtirado The while keyword will need to change, either in the task you propose to add, or in the for task, or discrimination will no longer be (easily) possible (i.e. check for presence of while keyword, which as a primary identifier must not be use by any other task)

nyamathshaik avatar Apr 15 '25 14:04 nyamathshaik

With that said, we now have 4 approaches from which we need to finalize from? @ricardozanini @cdavernas @fjtirado

My Preferance is Approach 2, followed by 1. WDYT?

✅ Approach 1: Using explicit loop: while | do-while keyword

do:
  - WhileLoopExample:
      loop: while | do-while
      condition: ${ .counter < 5 }
      do:
        - name: IncrementCounter
          type: set
          data:
            counter: ${ .counter + 1 }
        - name: LogMessage
          type: run
          run:
            script:
              lang: javascript
              code: console.log("Counter is: " + context.counter);

✅ Approach 2: Using condition as task and isDoWhile as a flag to determine if its while or a do-while loop

do:
  - WhileLoopExample:
      condition: ${ .counter < 5 }
      isDoWhile: true
      do:
        - name: IncrementCounter
          type: set
          data:
            counter: ${ .counter + 1 }
        - name: LogMessage
          type: run
          run:
            script:
              lang: javascript
              code: console.log("Counter is: " + context.counter);

✅ Approach 3: Using current while property from for task + isDoWhile: true as a flag to determine if its while or a do-while loop

do:
  - WhileLoopExample:
      while: ${ .counter < 5 }  # using the current while property from `for` task
      isDoWhile: true
      do:
        - name: IncrementCounter
          type: set
          data:
            counter: ${ .counter + 1 }
        - name: LogMessage
          type: run
          run:
            script:
              lang: javascript
              code: console.log("Counter is: " + context.counter);

✅ Approach 4: Having 2 different tasks, Using current while property from for task and a new task called do-while

do:
  - WhileLoopExample:
      while: ${ .counter < 5 }  # using the current while property from `for` task
      do:
        - name: IncrementCounter
          type: set
          data:
            counter: ${ .counter + 1 }
        - name: LogMessage
          type: run
          run:
            script:
              lang: javascript
              code: console.log("Counter is: " + context.counter);

do:
  - DoWhileLoopExample:
      do-while: ${ .counter < 5 }  # using the current while property from `for` task
      do:
        - name: IncrementCounter
          type: set
          data:
            counter: ${ .counter + 1 }
        - name: LogMessage
          type: run
          run:
            script:
              lang: javascript
              code: console.log("Counter is: " + context.counter);

nyamathshaik avatar Apr 15 '25 15:04 nyamathshaik

I'd hate to add another task to do loops since we already have one. A:

document:
      dsl: '1.0.0'
      namespace: default
      name: for
      version: '1.0.0'
   do:
     - myDoLoop
       for:
           while: ${ .counter < 5 }
            - name: IncrementCounter
              type: set
              data:
                counter: ${ .counter + 1 }
            - name: LogMessage
              type: run
              run:
                script:
                  lang: javascript
                  code: console.log("Counter is: " + context.counter);

Won't break anything and will add the possibility to keep the existing for task as it is, doing its loops. Having a strong and popular language doing the same is a compelling argument to the community.

IF we were to add a new task, neither of the approaches is compelling to me. What we can do to keep the same philosofy is:

document:
      dsl: '1.0.0'
      namespace: default
      name: while
      version: '1.0.0'
   do:
     - myDoLoop
       while:  ${ .counter < 5 }
           do:
            - name: IncrementCounter
              type: set
              data:
                counter: ${ .counter + 1 }
            - name: LogMessage
              type: run
              run:
                script:
                  lang: javascript
                  code: console.log("Counter is: " + context.counter);

And repeat:

document:
      dsl: '1.0.0'
      namespace: default
      name: repeat
      version: '1.0.0'
   do:
     - myDoLoop
       repeat:  
           do:
            - name: IncrementCounter
              type: set
              data:
                counter: ${ .counter + 1 }
            - name: LogMessage
              type: run
              run:
                script:
                  lang: javascript
                  code: console.log("Counter is: " + context.counter);
            until: ${ .counter < 5 }

Yes, we will introduce two additional tasks, but it's closer to the language philosophy we have now, and get rid of these boolean/enum attributes that are not fluent.

ricardozanini avatar Apr 15 '25 15:04 ricardozanini

@nyamathshaik regarding this:

Thanks for reviewing @ricardozanini JSON Schema doesn’t support positional logic. So I don't think https://github.com/serverlessworkflow/specification/issues/1096#issuecomment-2802432507 actually works. (i.e., if while appears before, interpret it one way; if after, interpret differently).

Yes, it would work because the do task would have a new while attribute. The position was just to make it more intelligible. But I agree that it can be confusing. See my comment above.

ricardozanini avatar Apr 15 '25 15:04 ricardozanini

@ricardozanini This sounds good to me. But wouldn't the while be contradicting with the while property in for task?

repeat..until looks absolutely fine for do-while

document:
      dsl: '1.0.0'
      namespace: default
      name: while
      version: '1.0.0'
   do:
     - myDoLoop
       while:  ${ .counter < 5 }
           do:
            - name: IncrementCounter
              type: set
              data:
                counter: ${ .counter + 1 }
            - name: LogMessage
              type: run
              run:
                script:
                  lang: javascript
                  code: console.log("Counter is: " + context.counter);

And repeat:


document:
      dsl: '1.0.0'
      namespace: default
      name: repeat
      version: '1.0.0'
   do:
     - myDoLoop
       repeat:  
           do:
            - name: IncrementCounter
              type: set
              data:
                counter: ${ .counter + 1 }
            - name: LogMessage
              type: run
              run:
                script:
                  lang: javascript
                  code: console.log("Counter is: " + context.counter);
            until: ${ .counter < 5 }

nyamathshaik avatar Apr 15 '25 15:04 nyamathshaik

@cdavernas @ricardozanini Can you guys please confirm on the final proposal please? So I can go ahead and work on the PR please.

nyamathshaik avatar Apr 15 '25 17:04 nyamathshaik

@cdavernas will probably review this tomorrow.

ricardozanini avatar Apr 15 '25 21:04 ricardozanini

@cdavernas any update on this please?

nyamathshaik avatar Apr 16 '25 08:04 nyamathshaik