More “Stream” Options
Describe the feature
When i ask for a response with stream option is on, seems like it will call string concat for every character generation, and if it's a long response, it will casue a lot of GC, means a serious performance problem.
So I‘m thinking is it possible to add a option that can just return the newest character generated instead of entire response made by string concat? Then, developer can decide how to use the character generated.
Or, add a overload function "LLMCharacter.Chat" change parameter type from "Callback string" to "Callback StringBuilder", and add a parameter to receive a outer string builder, can avoid string concat GC.
Yes this can be done, it needs a bit of engineering on the LlamaLib side