langchainjs OpenAI error: `Premature close (ERR_STREAM_PREMATURE

Sometimes when running OpenAI streaming I get the following error:

Error: Premature close
    at IncomingMessage.<anonymous> (/app/node_modules/node-fetch/lib/index.js:1748:18)
    at Object.onceWrapper (node:events:631:28)
    at IncomingMessage.emit (node:events:517:28)
    at emitCloseNT (node:internal/streams/destroy:132:10)
    at process.processTicksAndRejections (node:internal/process/task_queues:81:21) {
 code: 'ERR_STREAM_PREMATURE_CLOSE'
}

I don't know how to repro, but it happens consistently when I run big batches of jobs with many prompts. Sometimes it happens just when starting the request, sometimes it happens a while in (e.g. 10+ seconds into the generation).

ChatOpenAI.onFailedAttempt never gets called, so somehow p-retry inside ChatOpenAI isn't retrying requests when this happens. Instead, the agent runnable that invokes this.model catches the error.

Any idea how to make this error caught inside ChatOpenAI so that the request can be retried?

This is how I create my Langchain ChatOpenAI runnable:

import winston from "winston";

const STATUS_NO_RETRY = [
  400, // Bad Request
  401, // Unauthorized
  402, // Payment Required
  403, // Forbidden
  404, // Not Found
  405, // Method Not Allowed
  406, // Not Acceptable
  407, // Proxy Authentication Required
  408, // Request Timeout
  409, // Conflict
];

function createFailedAttemptHandler(logger?: winston.Logger) {
  return (error: any) => {
    logger?.warn('OpenAI failed attempt', { error, stack: (error as Error).stack });

    if (
      error.message.startsWith('Cancel') ||
      error.message.startsWith('AbortError') ||
      error.name === 'AbortError'
    ) {
      logger?.warn('OpenAI failed because of cancel/abort', { error });
      throw error;
    }

    const status = (error as any)?.response?.status ?? (error as any)?.status;
    if (status && STATUS_NO_RETRY.includes(+status)) {
      logger?.error('OpenAI failed because of status', { status });
      throw error;
    }

    if ((error as any)?.error?.code === 'insufficient_quota') {
      logger?.warn('OpenAI failed because of insufficient quota', { error });
      const err = new Error(error?.message);
      err.name = 'InsufficientQuotaError';
      throw err;
    }
  };
}

this.model = new ChatOpenAI({
      temperature: 0,
      modelName: 'gpt-4-1106-preview',
      maxTokens: 1000,
      streaming: true,
      cache: fields.cache,
      configuration: fields.config,
      maxRetries: 10,
      onFailedAttempt: createFailedAttemptHandler(fields.logger),
    })

Jan 29 '24 11:01 davidfant

🤖

Hey there, @davidfant! Great to see you diving into the depths of langchainjs once again. Hope your codes are treating you well today! 🚀

The error you're encountering, Premature close (ERR_STREAM_PREMATURE_CLOSE), is typically thrown when a stream ends before it was expected to. This can happen for a variety of reasons, such as network issues or the server closing the connection prematurely.

In your case, it seems like the error is not being caught by the onFailedAttempt handler in p-retry because it's not considered a "retriable" error by default. p-retry only retries on operational errors (like network errors) and if the error has a retry property set to true.

To make p-retry retry on this error, you can modify your createFailedAttemptHandler function to add a retry property to the error when it's a Premature close error:

function createFailedAttemptHandler(logger?: winston.Logger) {
  return (error: any) => {
    // ...

    if (error.code === 'ERR_STREAM_PREMATURE_CLOSE') {
      error.retry = true; // Add this line
    }

    // ...
  };
}

This will make p-retry consider the Premature close error as a retriable error and it will retry the request.

However, keep in mind that this might not solve the underlying issue causing the Premature close error. It might be worth investigating why these errors are happening in the first place. It could be due to network issues, server issues, or something else.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

Jan 29 '24 11:01 dosubot[bot]

Do you need to use .stream() for this or could use .batch() instead? Catching streaming errors is much more nuanced (e.g. what to do with partial responses).

Jan 30 '24 01:01 jacoblee93

Do you need to use .stream() for this or could use .batch() instead? Catching streaming errors is much more nuanced (e.g. what to do with partial responses).

I need to use stream (because this is streamed to the user on the frontend)! I'd be fine with just rerunning the request if I get a partial response. Do you know how I'd catch the streaming errors in the first place? Right now they seem to be falling through the retry logic

Jan 30 '24 09:01 davidfant

@jacoblee93 I think I see what's going on: when using streaming, completionWithRetry is used in ChatOpenAI._streamResponseChunks, which awaits (and should retry) the creation of the stream iterable. However, when later iterating over the stream, any errors thrown there won't be retried. Might that be right? https://github.com/langchain-ai/langchainjs/blob/6e5d02188bd29c7fe51ff1440a097150f88c547d/libs/langchain-openai/src/chat_models.ts#L463

A somewhat related issue: I'm also getting streams that sometimes end unexpectedly while generating JSON. This usually happens when OpenAI API reqs take an unreasonable amount of time (e.g. 4 min vs the regular 20s), so something is prob wrong on OpenAI's side, but Langchain doesn't allow me to catch these problems anyway. That throws errors in getEstimatedTokenCountFromPrompt because JSON parsing fails: https://github.com/langchain-ai/langchainjs/blob/6e5d02188bd29c7fe51ff1440a097150f88c547d/libs/langchain-openai/src/chat_models.ts#L711-L715

Those errors are also just thrown and not retried because it happens outside of the retry logic.

One potential solution:

Wrap the full streaming logic in the retry logic
In the defaultFailedAttemptHandler, rethrow the errors that are streaming related: https://github.com/langchain-ai/langchainjs/blob/6e5d02188bd29c7fe51ff1440a097150f88c547d/langchain-core/src/utils/async_caller.ts#L18 In this way, ppl still get the same behavior as now, but for use cases like mine that want more control over error handling, it's possible to provide a custom onFailedAttempt handler that e.g. decide to retry these errors.

Any reason you wouldn't want to wrap the whole streaming logic in the retry logic? https://github.com/langchain-ai/langchainjs/blob/6e5d02188bd29c7fe51ff1440a097150f88c547d/libs/langchain-openai/src/chat_models.ts#L526-L559

Feb 01 '24 10:02 davidfant

@jacoblee93 I think I see what's going on: when using streaming, completionWithRetry is used in ChatOpenAI._streamResponseChunks, which awaits (and should retry) the creation of the stream iterable. However, when later iterating over the stream, any errors thrown there won't be retried. Might that be right?

Yes, this is the intended behavior - we basically catch immediately thrown errors (missing auth tokens) for convenience. When we were implementing the original version of runnables, it wasn't 100% clear what default retry behavior should be mid-stream (abstractly, you could stream from some stateful API with consequences for just re-requesting like that), so we don't do anything by default and just throw it.

I think some way to hook a handler in would be reasonable though - for now, would it unblock you to implement it in your own application logic?

Feb 01 '24 19:02 jacoblee93

I think for token counting logic we should absolutely not error the entire call on parsing failures, so yes that's definitely a bug.

Feb 01 '24 19:02 jacoblee93

Sure, I'll implement the retry logic myself

Feb 05 '24 10:02 davidfant

Hi, @davidfant,

I'm helping the langchainjs team manage their backlog and am marking this issue as stale. From what I understand, you encountered a Premature close (ERR_STREAM_PREMATURE_CLOSE) error when running OpenAI streaming. There were discussions about modifying the p-retry library to retry on this error, using .stream() instead of .batch(), and wrapping the full streaming logic in the retry logic. However, you have decided to implement the retry logic yourself.

Could you please confirm if this issue is still relevant to the latest version of the langchainjs repository? If it is, kindly let the langchainjs team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation.

May 06 '24 16:05 dosubot[bot]

@davidfant curious if you ended up finding a solution to this you could share?