Add "human-in-the-loop" or "out-of-band" interaction capability to support input/action not known ahead of time
We have discussed a few times now the ability to allow a given step, based on some dynamic response values in previous steps, to possibly query the consumer/client/end user for further input. This would likely be more of a hint to tooling that it would at that point need to pull in additional input from some source, be it a GUI application popping up a dialog asking the user for some input or perhaps sending an AI or mock service a request to return the needed data.
An example might be a workflow where it retrieves a series of airlines/flights/etc for a range of destinations, but requires a user to then select one before continuing on with the workflow. There could be any number of right choices and would require further input not known when the workflow sequence started and likewise along the way responses from various steps don't provide the needed input to choose one specific airline/flight to then allow the next step... booking that flight, then maybe paying for it to occur. Similar.. the step to pay for it might need to know which credit card to use from a list of them.. or require the user to enter the CC info right then (e.g. dont save CC data so user must enter it at that moment).
Another use case is getting an S3 pre-signed URL to upload a file.
A use case that I just encountered is onboarding an IoT device, where some HTTP API calls are needed, but there is a BLE (bluetooth low energy) step in the middle that provides information for subsequent steps.
An argument in favor of this feature is that it allows folks to use Arazzo prior to Arazzo supporting more technologies for steps (AsyncAPI, BLE, whatever). Some of those (like AsyncAPI) will no doubt get supported, but others may not ever make sense to formally support.
@handrews funny you bring up IoT. I was looking at the Async stuff we are talking about adding.. and was thinking IoT devices would likely fit well within this scope as well. Having a workflow that uses a standard like Arazzo to tie it in to API calls + IoT devices might be pretty slick. Not sure anything is available like that at this time.
Other implementations for inspiration: https://langchain-ai.github.io/langgraph/how-tos/human_in_the_loop/add-human-in-the-loop/
https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html
https://cloud.google.com/workflows/docs/tutorials/callbacks-firestore
Also:
- https://learn.microsoft.com/en-us/answers/questions/2168187/how-to-handle-the-human-in-the-loop-for-concurrent
- https://cloud.google.com/workflows/docs/tutorials/callbacks-firestore
I'd like for us to consider using the Arazzo extension mechanism initially to sus out a solution to this. I think it's a perfect way to start to add support for this without breaking the spec (or changing it). It would however require some tooling vendor(s) to pick up the extensions and determine how to best utilize out of band input (human, AI, etc). I will assume some sort of runtime engine would also be needed to verify that it can pause, wait for whatever mechanism is implemented to return with a response to the continue the workflow execution.
Example:
steps:
- stepId: selectPet
operationId: getAvailablePets
x-arazzo-pause:
type: selection
prompt: "Choose a pet from the available options"
displayData: $response.body.pets
inputSchema:
type: object
properties:
petId:
type: string
reason:
type: string
timeout: 3600
onTimeout:
type: goto
stepId: showDefaultPets
outputs:
selectedPet: $x-pause-response.petId
Thoughts?
Feels like we have 2 distinct goals:
- human in the loop / wait for input
- making out-of-band requests (e.g. GET/PUT to a URL from a response payload)
@tgeens we discussed this today and wanted to respond to you.. we originally had called it "out of band" and changed it to "human in the loop" so my apologies for confusing the issue. What we've come away with so far is a discussion around if supporting Async API will provide the needs of "human in the loop" but may move forward with an extension (as I posted above perhaps) to a) see how it works out and b) as a potential more verbose option than async api (but not to remove the need for async api support).
As we're all moving more towards agentic agents and AI consumption, "human" may not be the proper term as I see a time where a pause/async task would likely get context/responses from AI or some other process that isn't "human in the loop" per se.
We will be discussing a bit more in our call in two weeks if you would like to join.
Thanks for reaching out, but we really are talking about different goals.
how about "actor-in-the-loop"