cohere-typescript icon indicating copy to clipboard operation
cohere-typescript copied to clipboard

UTF-8 streams might break in current stream implementation

Open lazydogP opened this issue 10 months ago • 4 comments

I'm using this package in Node.js environment and call cohere.chatStream to generate long Chinese texts. However, The replacement character (�) appears in random places in 'stream-end' event. The following code may convert incomplete UTF-8 chunks, which are yielded in parts by stream, into strings.

https://github.com/cohere-ai/cohere-typescript/blob/40c146c396dd5f4e9c079a9f99f11b3b7c48208e/src/core/streaming-fetcher/Stream.ts#L29-L32

While Latin Basic characters require only 1 byte in UTF-8, other characters, such as CJK characters, need more bytes to encode. This means there's a chance that a character could be split across chunks.

lazydogP avatar Apr 11 '24 17:04 lazydogP

Hey @lazydogP many thanks for this interesting find! We have repro'd it and will have a fix for you asap.

billytrend-cohere avatar Apr 11 '24 20:04 billytrend-cohere

@lazydogP we're planning to move to SSE to fix this issue for you! thanks for your patience

billytrend-cohere avatar Apr 17 '24 20:04 billytrend-cohere

@lazydogP we're planning to move to SSE to fix this issue for you! thanks for your patience

Is this resolved yet?

danny-avila avatar May 30 '24 15:05 danny-avila

Still happening on the latest version, it seems we can't rely on the stream-end event to avoid the issue:

https://github.com/danny-avila/LibreChat/pull/2922/commits/fe93f3a9688e48536ffc7e319be3b0d9c31243ea

danny-avila avatar May 30 '24 16:05 danny-avila

Hey all, this issue is now resolved in our v2 chat because we have switched to SSE which makes it easier to parse the streams. You can use it as follows:

    const stream = await cohere.v2.chatStream({
        model: "command-r",
        messages:[{ role: "user", content: "give me lots of emojis" }]
    })

    for await (const chat of stream) {
        if (chat.type === "content-delta") {
            process.stdout.write((chat.delta?.message?.content?.text as any));
        }
    }

billytrend-cohere avatar Oct 03 '24 11:10 billytrend-cohere