genkit feat: added a way to wrap existing models with additional middleware

The idea is to be able to do things like

const smoothGemini15Flash = defineWrappedModel({
  name: 'smoothGemini15Flash',
  model: gemini15Flash,
  use: [
    retry({ maxRetries: 2 }),
    smoothStreaming({ cadence: 25, randomize: true })
  ],
});

await generate({
  prompt: prompt,
  model: smoothGemini15Flash,
})

Jun 22 '24 20:06 pavelgj

What's the reasoning behind this instead of your first PR? I can see instances where I'd want to use each - middleware more oriented at specifics of the prompt would make sense to pass with generate, but middleware that I want to use to set "baseline behavior" for all prompts makes more sense with this approach.

If the question is around observability/tracing, I think we have some work to do there - I think generate should be an action instead of a wrapper around the model action. Tool loop etc. are too complex to not be brought into the observability story and I think we need to refactor.

Jun 23 '24 16:06 mbleigh

What's the reasoning behind this instead of your first PR? I can see instances where I'd want to use each - middleware more oriented at specifics of the prompt would make sense to pass with generate, but middleware that I want to use to set "baseline behavior" for all prompts makes more sense with this approach.

If the question is around observability/tracing, I think we have some work to do there - I think generate should be an action instead of a wrapper around the model action. Tool loop etc. are too complex to not be brought into the observability story and I think we need to refactor.

The problem is with the playground -- middleware passed into the generate function directly is not accessible from the playground. This breaks several CUJs -- from simply being able to see the effect of the middleware in the Dev UI when playing with the model, to replayability of model call spans (not being able to replay the exact model call) when doing "Open in model playground" from a trace.

One way to work around it would be to treat middleware similar to how we do tools -- we can register middleware in the registry (defineMiddleware) and pass in middleware refs to the generate call, which can be recorded in the trace and model playground can pass those in by reference... but this is a big full-stack undertaking... but perhaps worth it and we should add it to our backlog.

Jun 23 '24 17:06 pavelgj

The problem is with the playground -- middleware passed into the generate function directly is not accessible from the playground. This breaks several CUJs -- from simply being able to see the effect of the middleware in the Dev UI when playing with the model, to replayability of model call spans (not being able to replay the exact model call) when doing "Open in model playground" from a trace.

One way to work around it would be to treat middleware similar to how we do tools -- we can register middleware in the registry (defineMiddleware) and pass in middleware refs to the generate call, which can be recorded in the trace and model playground can pass those in by reference... but this is a big full-stack undertaking... but perhaps worth it and we should add it to our backlog.

I think there's probably an "all of the above" here but I think it would be perfectly reasonable to ship direct-use middleware without playground support (you can still test it out by wrapping in a flow) and it would unlock the most flexibility for users. We could expand that to named/defined middleware as an "also" not an "instead of" which would potentially open up Dotprompt support:

middleware:
  - name: SomeMiddleware
    args: {a: 123}

Jun 27 '24 21:06 mbleigh

The problem is with the playground -- middleware passed into the generate function directly is not accessible from the playground. This breaks several CUJs -- from simply being able to see the effect of the middleware in the Dev UI when playing with the model, to replayability of model call spans (not being able to replay the exact model call) when doing "Open in model playground" from a trace. One way to work around it would be to treat middleware similar to how we do tools -- we can register middleware in the registry (defineMiddleware) and pass in middleware refs to the generate call, which can be recorded in the trace and model playground can pass those in by reference... but this is a big full-stack undertaking... but perhaps worth it and we should add it to our backlog.

I think there's probably an "all of the above" here but I think it would be perfectly reasonable to ship direct-use middleware without playground support (you can still test it out by wrapping in a flow) and it would unlock the most flexibility for users. We could expand that to named/defined middleware as an "also" not an "instead of" which would potentially open up Dotprompt support:
middleware:
  - name: SomeMiddleware
    args: {a: 123}

I very very strongly oppose the idea of direct use of middleware in the generate function. It undermines and breaks the core idea of trace + playground combo. It does it in very subtle way that will leave developers confused. I mean it won't throw errors or anything obvious, it will return different results (one way will apply middleware the other won't).

Jun 28 '24 00:06 pavelgj

I'm confused...if I supply an output schema that also returns different results than if I don't. What's the scenario where this causes confusion? If I call generate in a code context and pass it middleware, that middleware should show up in the trace (I think that generate itself should be an action instead of a weird pseudo-wrapper around model actions FWIW).

If I'm in a playground and can't supply middleware, it's obviously not going to have the effects of that middleware applied. If I define a flow that includes a generate call that adds middleware, then I can call that in the playground and get the middleware applied.

I think "everything you can do with generate you can do in the model playground" is a reasonably good ideal to have, but I don't think the overall experience of Genkit breaks if there are some advanced things you can do in code that you can't do in the model playground.

Jun 28 '24 00:06 mbleigh

In my dream scenario, a trace with middleware would have steps that look something like:

generate({
  model: gemini15Flash,
  prompt: "Do cool stuff.",
  use: [myMiddleware, mySecondMiddleware],
});

generate
- myMiddleware
  - mySecondMiddleware
    - vertexai/gemini-1.5-flash

Jun 28 '24 00:06 mbleigh

In my dream scenario, a trace with middleware would have steps that look something like:
generate({
  model: gemini15Flash,
  prompt: "Do cool stuff.",
  use: [myMiddleware, mySecondMiddleware],
});
generate

myMiddleware

mySecondMiddleware

vertexai/gemini-1.5-flash

You are definitely thinking outside the box here. It's not how we do it today. I mean, yeah, if we break up the trace for the generate function like that it solves the main problem I'm talking about, but the more middleware you put directly into the generate function the less useful Genkit tooling becomes.

Where's the line? How do you determine what's OK to call directly via generate and what's better exposed to the Dev UI? I mean, if you have some custom tool calling... obviously you want that to be available in the Dev UI. What about retry logic? Depends... but if it's some advanced self healing (feed the error back to the llm and ask to correct the previos output)... again, totally worth being available in the Dev UI. It's probably an exception rather than the rule when exposing something in the Dev UI as a model primitives is not useful.

If the developer wants to build advanced features they should be building them on top of generate, not into it. Prefer composition of primitives.

Jun 28 '24 03:06 pavelgj

What's the latest status of this conversation?

Jul 19 '24 00:07 chrisraygill

I think at the moment we're kind of at an impasse. I see middleware as being one of the most powerful yet simple-to-consume abstractions in software but I think my ❤️ is not shared by most people. The in-use comparison is basically:

await generate({
  prompt: promptOrSomething,
  use: [myMiddleware({option1: true}), otherMiddleware()]
});

vs.

const before1 = await myNotMiddleware({option1: true}, promptOrSomething);
const before2 = await otherMiddleware(before1);
const unprocessedResult = await generate(before2);
// maybe you also did something after the request in one of the middlewares
const result = await myNotMiddlewareAfter({option1: true}, unprocessedResult);

Middleware is robustly configurable and allows for complex transformation in a simple package, but is harder to conceptualize for the people who write it (until they manage to completely grok the pattern at which point it becomes very straightforward, but this admittedly is not most devs).

the more middleware you put directly into the generate function the less useful Genkit tooling becomes

This is only true if we don't decide to make middleware a key part of Genkit including the tooling. I think adding middleware as a registry-backed primitive with full support in the DevUI is totally doable and would actually be really cool! Imagine if the model playground just let you add extra middleware like configurable retries or any number of things and then you can just export it to a dotprompt or code.

Jul 19 '24 02:07 mbleigh

+1 to a registry-backed primitive. I haven not yet heard it discussed, but I think there are some nice possibilities for 3P middleware if we open it up. There are "generative" behaviors that not every model implements (such as system prompt) that we already express via middleware, and recently wrapping models w/ tool calling came up as something we should probably have (e.g. for ollama). Wouldn't it be cool if that were a middleware that could be retroactively applied by the developer to models that don't support it out of the box.

Jul 22 '24 13:07 MichaelDoyle

There are 2 ways to use middleware:

in a way that adds it to the action and can be played around from the Dev UI as a model (this PR).
directly in the generate function.

I think those two ways are kind of orthogonal. I think (1) should be at least one of the ways, I don't see why not. I believe (2) has inherent issues, it's a slippery slope to making Dev UI useless over time. I'd like to avoid it. But we can continue the debate. In theory both those can be available.

Jul 22 '24 13:07 pavelgj