quarkus-langchain4j icon indicating copy to clipboard operation
quarkus-langchain4j copied to clipboard

Add Opensearch embedding store implementation

Open sebastienblanc opened this issue 1 year ago • 16 comments

Maybe @sboeckelmann can also review this since you are the author of the opensearch extension (and also added support in langchain4j)

sebastienblanc avatar Jan 09 '24 07:01 sebastienblanc

@yrodiere might also be interested in this :)

geoand avatar Jan 09 '24 07:01 geoand

don't use the old rest-client, those are deprecated. The AWS2 SDK unfortunately needs to have the Apache HTTP Client stack being setup. Use the new Async Java Client

sboeckelmann avatar Jan 09 '24 09:01 sboeckelmann

@geoand what should we do about this one? The HTTP client issue is concerning.

cescoffier avatar Nov 17 '24 15:11 cescoffier

I think we can close this

geoand avatar Nov 17 '24 16:11 geoand

I was hoping for the apache httpclient 5 to become officially supported by Quarkus some day. OpenSearch depends on httpclient5, but Quarkus only provides for version 4.

So in my opinion there are 2 options:

  1. Have Opensearch use httpclient 4 (which is not possible)
  2. Have general httpclient5 support within Quarkus

sboeckelmann avatar Nov 17 '24 16:11 sboeckelmann

I was hoping for the apache httpclient 5 to become officially supported by Quarkus some day

That is highly unlikely unfortunately as it would be a suboptimal use of our (limited) resources

geoand avatar Nov 17 '24 17:11 geoand

That is highly unlikely unfortunately as it would be a suboptimal use of our (limited) resources

Yes, exactly, in addition to that, I think having support for httpclient5 within Quarkus would not really make much of a difference either

I actually don't see such a big isse with the httpclient5 dependency within opensearch. It's not nice to have two artifacts doing the same thing - but they are totally separate and independent and except for bloating things up with additional packages, it's not really doing any harm: the one included in Quarkus has the artifactId httpclient, the other one's artifactId is httpclient5

sboeckelmann avatar Nov 17 '24 19:11 sboeckelmann

@sboeckelmann There is a bit more to the story.

We don't know if it works in native OOTB. Generally, when we have extensions, it's because we have to tune things. Also, it would not follow the underlying network architecture of Quarkus, which will likely miss context propagation, security propagation, and so on.

cescoffier avatar Nov 24 '24 16:11 cescoffier

@sboeckelmann There is a bit more to the story.

I really understand that. But like mentioned above: there's no httpcient 4 support in opensearch. The only option I can think of is moving the TranportProducer for httpclient5 out into an optional additional artifact. AWS should be fine.

sboeckelmann avatar Nov 24 '24 16:11 sboeckelmann

Even httpclient4 is not great. Is there any possibility of plugging an HTTP client managed by the framework itself?

cescoffier avatar Nov 24 '24 17:11 cescoffier

Even httpclient4 is not great. Is there any possibility of plugging an HTTP client managed by the framework itself?

Thank you @cescoffier ! I’m really happy to see that the discussion has been picked up again and is moving forward.

Regarding the possibility of plugging in an HTTP client managed by the framework: I can definitely look into this. It would be very helpful if you could provide a hint or reference to where this has already been implemented. This would greatly assist in identifying what needs to be done and how best to support this feature in a clean and effective way.

sboeckelmann avatar Nov 25 '24 08:11 sboeckelmann

What @cescoffier is referring is what for example what the Kubernetes Client or Testcontainers do, where the is an HTTP related SPI that various modules (like OkHttp, Vertx Http Client, JDK Http Client etc) can implement.

geoand avatar Nov 25 '24 08:11 geoand

What @cescoffier is referring is what for example what the Kubernetes Client or Testcontainers do, where the is an HTTP related SPI that various modules (like OkHttp, Vertx Http Client, JDK Http Client etc) can implement.

Can you pinpoint me to where this was done in Kubernetes Client and what the SPI looks like over there? Would be nice to be able to follow some sort of blueprint.

sboeckelmann avatar Nov 25 '24 08:11 sboeckelmann

You can start looking at io.fabric8.kubernetes.client.http.HttpClient . FWIW, the Kubernetes Client probably does more than most other such integrations would need.

geoand avatar Nov 25 '24 09:11 geoand

@sboeckelmann, any update on your side?

cescoffier avatar Dec 09 '24 07:12 cescoffier

@sboeckelmann, any update on your side?

I am planning on providing httpclient5 and aws as separate HttpTransport provider modules. This should resolve the problem (or at least provide requried flexibitliy). But I won't be able to work on it before the beginning of next year.

sboeckelmann avatar Dec 09 '24 08:12 sboeckelmann