quarkus-langchain4j icon indicating copy to clipboard operation
quarkus-langchain4j copied to clipboard

Proper CDI scope for conversation

Open cescoffier opened this issue 1 year ago • 44 comments
trafficstars

Memory is required to keep tracks of the previous messages in a conversation. It's also required when using tools.

This issue is about addressing two issues:

  • how to configure the memory size - even if it's just a message count, it may not work as the context would be exceeded (especially when using RAG). Also, eviction is tricky
  • how to properly define the boundary of the memory - like the web socket session

Of course, the memory storage must be per conversation / users and not for the application.

An idea is to define a new CDI scope that would handle the second issue.

cescoffier avatar Nov 21 '23 06:11 cescoffier

We need this very very much :)

geoand avatar Nov 21 '23 16:11 geoand

Let's start with a specific implementation of memory for Websockets and see how far that gets us.

geoand avatar Nov 29 '23 13:11 geoand

What about calling it @ConversationScoped ;o)

agoncal avatar Dec 06 '23 05:12 agoncal

@agoncal That's exactly the problem. In addition,

  1. we would need "session" support (Ladislav is working on this).
  2. a user may start multiple conversations concurrently, making things slightly more complicated.

cescoffier avatar Dec 06 '23 06:12 cescoffier

I believe that a new well-defined scope would be more appropriate. I do remember the hard times trying to fix all the bugs related to the conversation context in Weld.

mkouba avatar Dec 06 '23 07:12 mkouba

@mkouba just so we can be aware, what sort of issues where there? Asking because I am sure similar problems will arise in this context

geoand avatar Dec 06 '23 07:12 geoand

@mkouba just so we can be aware, what sort of issues where there? Asking because I am sure similar problems will arise in this context

Well, I don't remember all the details but I can try to walk through the Weld issues/codebase in order to identify the problematic parts. I do remember that many issues were related to concurrent access and lifecycle (such as "an HTTPSession is destroyed in the middle of a request processing"). We also had to initialize the context lazily because of some character encoding issues (see http://weld.cdi-spec.org/documentation/#3). You also need a mechanism to prune long-running conversations that were not explicitly ended (i.e. timed out). Anyway, the conversation context is bound to the Servlet API. So at least it would be very confusing if we support this scope on top of Vert.x.

mkouba avatar Dec 06 '23 08:12 mkouba

Thanks for the info!

geoand avatar Dec 06 '23 08:12 geoand

FYI, there is already a CDI Scope called ConversationScoped, may want to avoid a name collision

exabrial avatar Dec 06 '23 15:12 exabrial

Yeah, we definitely don't want more confusion

geoand avatar Dec 06 '23 15:12 geoand

how to properly define the boundary of the memory - like the web socket session

is this not similar/exact same to our struggles around scoping various contexts when it comes to reactive call chains, http sessions, web socket, grpc, etc? i.e. websocket is not "special" in this is it?

maxandersen avatar Dec 06 '23 15:12 maxandersen

Lets just for the value of my (and possible others) education, explore @ConversationScoped on why it makes sense and then lets see why it would not.

From CDI Spec I see:



    begin() marks the current transient conversation long-running. A conversation identifier may, optionally, be specified. If no conversation identifier is specified, an identifier is generated by the container.

    end() marks the current long-running conversation transient.

    getId() returns the identifier of the current long-running conversation, or a null value if the current conversation is transient.

    getTimeout() returns the timeout, in milliseconds, of the current conversation.

    setTimeout() sets the timeout of the current conversation.

    isTransient() returns true if the conversation is marked transient, or false if it is marked long-running.

Meaning - it's scope that you can have multiple of, can give it an id and it transcends multiple requests - seems like a perfect fit at first glance.

the conversation context is bound to the Servlet API.

Why is that ? If you refer to Conversation Context in Jakarta EE is that not just limited to Jakarta EE ? something we aren't bound by/limited by ?

as I understand it the CDI spec is not limited to jakarta EE nor is there a limit on how many @ConversationScope's we could have (even in Jakarta EE)?

maxandersen avatar Dec 07 '23 08:12 maxandersen

Adding @antoinesd as he also had some concerns about @ConversationScoped

agoncal avatar Dec 07 '23 08:12 agoncal

the conversation context is bound to the Servlet API.

Why is that ? If you refer to Conversation Context in Jakarta EE is that not just limited to Jakarta EE ? something we aren't bound by/limited by ?

The built-in implementation is bound to the Servlet API.

In theory, you could implement your own conversation context but there are obstacles. First of all, the built-in context (servlet-based) should be supported if you provide a custom one. Because users would be super confused if they have undertow extension (servlet) in their project but the lifecycle/behavior of the converstation context is completetly different (compared to the well-known built-in impl).

The container must provide a built-in bean for jakarta.enterprise.context.Conversation that would have to support the custom conversation context. That's not impossible but requires additional SPI so that the container can use the custom conversation context.

Futhermore, the Conversation API does not allow you to switch to another conversation as it expects that (in most cases) a single conversation is used in a request.

In any case, you would need to attach the conversation to some kind of "session". Which we currently don't have. And in my opinion it should not be based on a web technology because then it would not be usable outside an HTTP request anyway (which is the case for the built-in impl).

mkouba avatar Dec 07 '23 09:12 mkouba

Hi @mkouba

The built-in implementation is bound to the Servlet API.

That might be true, but I'm not sure why it matters?

In theory, you could implement your own conversation context

I mean, CDI was deliberately designed so that you can, so this is not just a matter of "theory", it how CDI was intended to be used.

but there are obstacles

So they should be removed.

First of all, the built-in context (servlet-based) should be supported if you provide a custom one.

Naturally, but how's this an "obstacle"?

Because users would be super confused if they have undertow extension (servlet) in their project but the lifecycle/behavior of the converstation context is completetly different (compared to the well-known built-in impl).

The lifecycle of the conversation context is defined by section 6.7.4-6.7.5 of the CDI spec. It is controlled by explicit user invocation of the Conversation interface. That section of the spec does not mention, and has never mentioned, anything about HTTP.

Section 11.3.4 outlines a particular implementation of a conversation context which is compatible with the Java EE web layer. That section of the spec is not supposed to be read as defining the behavior of the conversation context in general.

The container must provide a built-in bean for jakarta.enterprise.context.Conversation that would have to support the custom conversation context. That's not impossible but requires additional SPI so that the container can use the custom conversation context.

It's not only "not impossible", it should in principle be straightforward. If it's currently difficult, then that's a limitation we should address.

In any case, you would need to attach the conversation to some kind of "session". Which we currently don't have.

I mean, yes, of course. Naturally, that's something we would need to solve. I read that as being an implicit assumption in this proposal, not an objection to the proposal.

And in my opinion it should not be based on a web technology because then it would not be usable outside an HTTP request anyway (which is the case for the built-in impl).

Indeed it should not be limited on HTTP. That's implicit, I guess.

Futhermore, the Conversation API does not allow you to switch to another conversation as it expects that (in most cases) a single conversation is used in a request.

Right, There's indeed a subtlety we need to nail down here, which is that a transient conversation context is implicitly created and associated with every HTTP request. We would need to make sure that there's no room for pathological interaction between this transient context and the "chat" conversation. I have not thought about this very carefully, and there might indeed be some things to solve here. That said, as a matter of first impression, I don't see why it should not be doable in principle.

@mkouba @ConversationScoped is an abstract notion, just like @RequestScoped. Its definition is completely clean of everything to do with HTTP. Just like you can have a request context for a request that does not originate in HTTP, you can have a conversation that doesn't take place over HTTP.

gavinking avatar Dec 07 '23 10:12 gavinking

a user may start multiple conversations concurrently, making things slightly more complicated.

@cescoffier so the wrinkle here is: can the "user" start multiple conversations from within the same request context? If so, then there's indeed some sort of mismatch with how @ConversationScoped is currently conceptualized. But if you can only start a single conversation within a given request context, I think it's a good fit. And, FTR, I don't know how "multiple conversations per request context" would even work. Seems like something that would be quite hard to fit into the whole model of CDI. (Note that "request context" ≠ HTTP request, you could of course do something to fork off a new request context, and start a conversation from there.)

gavinking avatar Dec 07 '23 10:12 gavinking

I mean, yes, of course. Naturally, that's something we would need to solve

Absolutely. In Quarkus currently we have a major gap in the WebSocket handling which I envision will be a heavily "transport" for such "conversations"

geoand avatar Dec 07 '23 10:12 geoand

Absolutely. In Quarkus currently we have a major gap in the WebSocket handling which I envision will be a heavily "transport" for such "conversations"

And that's even worth solving completely independently of this idea, right?

gavinking avatar Dec 07 '23 10:12 gavinking

Totally

geoand avatar Dec 07 '23 10:12 geoand

In theory, you could implement your own conversation context

I mean, CDI was deliberately designed so that you can, so this is not just a matter of "theory", it how CDI was intended to be used.

No matter what was intended I've never seen a custom conversation context impl. Hence it's a theory from my point of view.

First of all, the built-in context (servlet-based) should be supported if you provide a custom one.

Naturally, but how's this an "obstacle"?

We deliberately chose not to implement this context (and it's not even required by CDI Lite) because we wanted to avoid all the problems we had to deal with in Weld during the years; corner cases like manual session invalidation, character encoding issues mentioned in https://github.com/quarkiverse/quarkus-langchain4j/issues/47#issuecomment-1842446440, etc.

Because users would be super confused if they have undertow extension (servlet) in their project but the lifecycle/behavior of the converstation context is completetly different (compared to the well-known built-in impl).

The lifecycle of the conversation context is defined by section 6.7.4-6.7.5 of the CDI spec. It is controlled by explicit user invocation of the Conversation interface. That section of the spec does not mention, and has never mentioned, anything about HTTP.

Section 11.3.4 outlines a particular implementation of a conversation context which is compatible with the Java EE web layer. That section of the spec is not supposed to be read as defining the behavior of the conversation context in general.

I'm talking about "conversation propagation" which is inherently implementation-specific and is in my opinion part of the "lifecycle definition", in case of the built-in impl there are cid and conversationPropagation request parameters.

And in my opinion it should not be based on a web technology because then it would not be usable outside an HTTP request anyway (which is the case for the built-in impl).

Indeed it should not be limited on HTTP. That's implicit, I guess.

:+1: but I'm not so sure everyone here takes it as implicit...

That said, as a matter of first impression, I don't see why it should not be doable in principle.

I'm not saying it's not doable but I do believe that we should start with a clean slate. To avoid confusion - I think that most of the users out there perceive the conversation context as something connected to an HttpSession. To avoid the need for built-in context impl (servlet based). To build an extendable AI-specific API.

@mkouba @ConversationScoped is an abstract notion, just like @RequestScoped. Its definition is completely clean of everything to do with HTTP. Just like you can have a request context for a request that does not originate in HTTP, you can have a conversation that doesn't take place over HTTP.

I really do understand all these concepts..

mkouba avatar Dec 07 '23 10:12 mkouba

@mkouba all those issues about conversation propagation is something we would have with any kind of scope for this would it not ?

I don't think users sees @ConversationScope as something tied to http - I fully get why you feel it that way because its the variant having to be implemented.

But I would say we should not dismiss that @ConversationScope makes sense to (re)use just because of current Jakarta EE conversation scope implementation details.

maxandersen avatar Dec 07 '23 11:12 maxandersen

No matter what was intended I've never seen a custom conversation context impl. Hence it's a theory from my point of view.

OK, no problem, now you're (potentially) seeing it for the first time.

First of all, the built-in context (servlet-based) should be supported if you provide a custom one.

Naturally, but how's this an "obstacle"?

We deliberately chose not to implement this context

OK, I see. Then let me clarify my earlier response: naturally, if we already have HTTP session-based conversations then this new functionality would have to live peacefully side-by-side with the functionality that already exists. On the other hand, if no such functionality exists nor is needed, then there's no need to add it just because we have a different kind of conversation context.

We only need the things we need.

I'm talking about "conversation propagation" which is inherently implementation-specific

And quite deliberately so, given that a conversation is an abstract notion.

I'm not saying it's not doable but I do believe that we should start with a clean slate.

We would be talking about reusing one annotation @ConversationScoped and one interface Conversation. That's spitting distance from a "clean slate".

Now, it might indeed turn out that the Conversation interface is inappropriate in some respect, but you haven't demonstrated that yet. We would need to get a lot deeper into the weeds to be able to make that determination, it seems to me. But perhaps you've noticed something I've missed?

I think that most of the users out there perceive the conversation context as something connected to an HttpSession.

And now their perception will change, when they encounter the first instance of a conversation which is not connected to a HTTP session.

To avoid the need for built-in context impl (servlet based). To build an extendable AI-specific API.

That's what's being proposed, as far as I can tell. Nobody is saying that this should be based on servlets. As far as I can tell, the proposal is saying something quite different.

gavinking avatar Dec 07 '23 11:12 gavinking

That's what's being proposed, as far as I can tell. Nobody is saying that this should be based on servlets. As far as I can tell, the proposal is saying something quite different.

Exactly

geoand avatar Dec 07 '23 11:12 geoand

@mkouba @ConversationScoped is an abstract notion, just like @RequestScoped. Its definition is completely clean of everything to do with HTTP. Just like you can have a request context for a request that does not originate in HTTP, you can have a conversation that doesn't take place over HTTP.

similar issue arose about requestscope in reactive messaging - https://github.com/quarkusio/quarkus/issues/21367 here for long time it was not well-defined what @Requestscope means - in the end we ended up with that requestscope is not activated by default as its lifecycle would add overhead; but its possible to enable with @ActivateRequestContext

Wonder if other similar adaption would make sense.

maxandersen avatar Dec 07 '23 11:12 maxandersen

in reactive messaging - https://github.com/quarkusio/quarkus/issues/21367 here for long time it was not well-defined what @RequestScope means

See, I mean, I would have said (and I'm pretty sure I have said it plenty of times) that there's a perfectly clear and natural interpretation of @RequestScoped for any messaging service / event processing architecture. When the program receives a message, or is notified of an event, then processing of that particular message or event takes place within a request context that's tied to that message or event.

And one can even imagine a conversation context in this sort of environment: a context that's tied to the larger business process that encompasses a chain of messages/events. Indeed, we once had a conversation context just like that in JBPM. (Remember JBPM?!) This isn't so far removed from what we're talking about here, in fact.

in the end we ended up with that requestscope is not activated by default as its lifecycle would add overhead

This is really quite surprising to me. I mean, how would lazily creating a HashMap the first time a program explicitly injects a @RequestScoped object add significant overhead to requests which aren't making use of injection? Like ... where is the "overhead" coming from?

gavinking avatar Dec 07 '23 11:12 gavinking

I think the english language gets in the way here and there are multiple "conversations" in play here.

Let me try and explain how I see the various "conversations" - tell me where I'm wrong/missing things :)

The "conversation" in LLM usecase are literally just a list of messages that contain the state that is growing over time as you have a "discussion" with the LLM. if you use chat.openai.com it matches to the list of previous "chats" on the left side in its UI. These conversations are data that can be in memory or even serialized out to disk/database. So those there will be multiple off per user.

While these "chat/discussion/conversation" are in the process memory you want to be able to set some trimming/eviction rules on them - at least not have them grow unbounded while in memory. Again, a user could have multiple of these.

Then there is the "conversation scope" which is a scope used to keep things around longer for a specific users interaction - and this is the kind of scope that (to me at least) makes sense to store those in-memory conversations. And for a users interaction I would say there would be just one conversation scope which potentially could have multiple LLM conversations.

The reason @ConversationScope makes sense to me is that it just fits nicely it seems:

A conversation represents a task—a unit of work from the point of view of the user. The conversation context holds state associated with what the user is currently working on. If the user is doing multiple things at the same time, there are multiple conversations.

The conversation context is active during any servlet request (since CDI 1.1). Most conversations are destroyed at the end of the request. If a conversation should hold state across multiple requests, it must be explicitly promoted to a long-running conversation.

maxandersen avatar Dec 07 '23 11:12 maxandersen

Look, I found the docs for JBPM-managed conversations in Seam: https://docs.jboss.org/seam/2.3.1.Final/reference/html/jbpm.html

gavinking avatar Dec 07 '23 11:12 gavinking

in the end we ended up with that requestscope is not activated by default as its lifecycle would add overhead

This is really quite surprising to me. I mean, how would lazily creating a HashMap the first time a program explicitly injects a @RequestScoped object add significant overhead to requests which aren't making use of injection? Like ... where is the "overhead" coming from?

really need to hear from @cescoffier and @mkouba on that one. My understanding was that the boundaries was not clear when it should happen + context propagation as consequences became excessively expensive.

maxandersen avatar Dec 07 '23 11:12 maxandersen

My understanding was that the boundaries was not clear when it should happen + context propagation as consequences became excessively expensive.

So, I mean at a guess there's two things that might have been going on there:

  1. propagating context with Mutiny has been very problematic in the past for reasons that we've already discussed to death, and don't need to get into here, and
  2. I have the impression that previously people were thinking "too big" in terms of the lifecycle of a request context. Request contexts should be fine-grained. When people start saying that the boundaries of the request are unclear, I suspect them of trying to make the request context too big.

Indeed—and now I'm really just speculating—perhaps something that's been going on here is that people have been trying to propagate request contexts too far precisely because they felt they didn't have anything "bigger" to work with. But now maybe we're recognizing that there actually is a "bigger" thing: @ConversationScoped, and it has explicit demarcation in code, which helps make the boundaries clear. Not sure that's quite right, but perhaps it's a helpful lens.

gavinking avatar Dec 07 '23 11:12 gavinking

a user may start multiple conversations concurrently, making things slightly more complicated.

@cescoffier so the wrinkle here is: can the "user" start multiple conversations from within the same request context? If so, then there's indeed some sort of mismatch with how @ConversationScoped is currently conceptualized. But if you can only start a single conversation within a given request context, I think it's a good fit. And, FTR, I don't know how "multiple conversations per request context" would even work. Seems like something that would be quite hard to fit into the whole model of CDI. (Note that "request context" ≠ HTTP request, you could of course do something to fork off a new request context, and start a conversation from there.)

Yes, in theory you can start multiple conversation concurrently from the same request scoped. Typically, the processing of a single HTTP request can start two conversation with 2 different AI services or even with the same. I don't think we should have a limitation here.

cescoffier avatar Dec 07 '23 15:12 cescoffier