tokmon
tokmon copied to clipboard
Support `stream=True` without buffering up the full response
If the monitored program makes use of OpenAI response streaming (with SSE), incoming chunks gets buffered until the [DONE] message. This alters the behavior of the monitored program and is undesirable.
Related code: see commented out block in tokmon.py around line 34.