edgedb-js icon indicating copy to clipboard operation
edgedb-js copied to clipboard

Add a streaming RAG method

Open scotttrinh opened this issue 1 year ago • 6 comments

Adds an SSE-style streaming response method along with a lower-level async generator.


Example output from httpie:

http http://localhost:3004 message=="Tell me something about pluto"
HTTP/1.1 200 OK
Connection: keep-alive
Content-Type: text/event-stream
Date: Wed, 17 Apr 2024 02:14:59 GMT
Keep-Alive: timeout=5
Transfer-Encoding: chunked

event: message_start
data: {
    "message": {
        "id": "chatcmpl-9Ep76xGnUpwaq8Xz8bSq8DAVRbFQ2",
        "model": "gpt-4-0125-preview",
        "role": "assistant"
    },
    "type": "message_start"
}

event: content_block_start
data: {
    "content_block": {
        "text": "",
        "type": "text"
    },
    "index": 0,
    "type": "content_block_start"
}

event: content_block_delta
data: {
    "delta": {
        "text": "Pl",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": "uto",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": "'s",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " surface",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " is",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " primarily",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " made",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " up",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " of",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " nitrogen",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " ice",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": ",",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " methane",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": ",",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " and",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " carbon",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": " mon",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": "oxide",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_delta
data: {
    "delta": {
        "text": ".",
        "type": "text_delta"
    },
    "index": 0,
    "type": "content_block_delta"
}

event: content_block_stop
data: {
    "index": 0,
    "type": "content_block_stop"
}

event: message_delta
data: {
    "delta": {
        "stop_reason": "stop"
    },
    "type": "message_delta"
}

event: message_stop
data: {
    "type": "message_stop"
}


scotttrinh avatar Apr 17 '24 02:04 scotttrinh

@scotttrinh what do you think about an intermediate object to decouple input/output?

await queryRag(message, context).text()
await queryRag(message, context).response()
for await (part of queryRag(message, context)) { ... }

CarsonF avatar May 17 '24 14:05 CarsonF

@CarsonF

what do you think about an intermediate object to decouple input/output?

I'm not sure I understand the suggestion here 🤔

scotttrinh avatar May 17 '24 14:05 scotttrinh

@CarsonF

what do you think about an intermediate object to decouple input/output?

I'm not sure I understand the suggestion here 🤔

Instead of 3 top level query functions to vary the output shape, just have 1. Then the output shape is picked independently. Also the async iterable symbol can be used instead of a third dev name. The snippet is what the user call sites would look like.

CarsonF avatar May 17 '24 14:05 CarsonF

Ahh, I understand now. Seems reasonable so we'll have to answer the question of which is better on some other axes:

  1. What is more intuitive for the end user?
  2. What do other SDKs (like OpenAI and Anthropic) do?
  3. What has the lowest maintenance overhead?
  4. What is easiest to evolve in the future?
  5. What is easiest to document?
  6. What does the Python version of this same functionality do?

Some of those questions are research for me, others will require a bit of thinking and trying to intuit an answer. Feel free to state the case for a single function that returns an object that exposes the different interfaces vs. separate functions.

scotttrinh avatar May 17 '24 15:05 scotttrinh

As far as # 1 goes, fetch has the same API

await fetch(...).json()
await fetch(...).arrayBuffer()

I don't see any problems here.

I can't speak to 2 & 6.

As far as 3 & 4 both the inputs & outputs have to work together in implementation obviously. But this decoupling allows modifying args or output methods without needing to adjust multiple signatures. Giving the EdgeDBAI class a single entrypoint for RAG would play nice for other AI strategies in the future.

I did a first pass here https://github.com/edgedb/edgedb-js/compare/stream-ai...CarsonF:edgedb-js:stream-ai

CarsonF avatar May 17 '24 19:05 CarsonF

As far as # 1 goes, fetch has the same API

Yeah, and while I agree that works as a primitive since it allows you to have one function with lots of different behavior, I think some people find that API annoying as an end-user API. Great for a primitive, though, and maybe that's really what we're building here.

As far as 3 & 4 both the inputs & outputs have to work together in implementation obviously. But this decoupling allows modifying args or output methods without needing to adjust multiple signatures. Giving the EdgeDBAI class a single entrypoint for RAG would play nice for other AI strategies in the future.

Yeah, I think that's a compelling argument! Somehow that makes me feel even more like this is a primitive on which higher-level functionality should be built (like our soon-to-come Vercel AI SDK integration) and therefore we should optimize for a flexible low-level primitive.

Lemme shop the idea around a bit more, thanks for taking the time to make an example implementation, that makes it nice and concrete for discussion. 🙏

scotttrinh avatar May 17 '24 19:05 scotttrinh

OK, so @diksipav and I have been going back and forth a bunch about the API here, and I think I have some more concrete thoughts on await queryRag(message, context).text() vs await queryRag(message, context)/await streamRag(message, context):

fetch-like API

One major difference here is that, for consumers of fetch you need to inspect the Response object to even know which method is the appropriate one based on headers like content-type and transfer-encoding, etc. We don't have that restriction: the caller gets to tell the producer how it wants the data back: as a string, or as a stream.

Intermediate object

One possible advantage of having an API with an intermediate object you get back from queryRag is that you can refetch the same query without having to redefine it. I haven't yet thought of a good reason or use case for this though. Perhaps if it made sense like it does for the query builder where you have fixed messages and context, but might have different providers or config, but I don't think that's common enough to make this a first-class design choice given that the "query" we're talking about here is a pretty simple data structure: string.

Consistency

This is not a deal breaker, but in my mind it might be a good tie-breaker: consistency with the Python package. There we have separate methods and no intermediate object.

scotttrinh avatar Sep 18 '24 17:09 scotttrinh

Maybe the python library should change too 😛

CarsonF avatar Sep 19 '24 14:09 CarsonF