llama-cpp-python
llama-cpp-python copied to clipboard
Create server_streaming.py
This pull request adds the server_streaming.py example for the high-level API.
The server_streaming.py file is created under the examples/high_level_api/ directory. It contains an implementation of a server streaming example using the high-level API. The code sends messages through an HTTP POST request to "http://localhost:8000/v1/chat/completions" and processes the response. If the stream is set to non-streaming mode, the content of the response will be printed. Otherwise, the code performs streaming processing on the response and prints the parsed content.
This pull request aims to provide a practical example of server streaming using the high-level API, showcasing how to interact with an HTTP server and handle responses.
What is the purpose of this example? Is there a way to get a real streaming? I mean, starting to show the user an output while the LLM behind the REST server is still processing?