Support `stream=True` without buffering up the full response

Open yagil opened this issue 2 years ago • 0 comments

If the monitored program makes use of OpenAI response streaming (with SSE), incoming chunks gets buffered until the [DONE] message. This alters the behavior of the monitored program and is undesirable.

Related code: see commented out block in tokmon.py around line 34.

Apr 23 '23 16:04 yagil