add LLM in adapter and save query and answer
Does the current LLM (Large Language Model) adapter for this project support streaming answers? For scenarios that require low latency, is there a plan to support this feature in the future if it's not available now? Thank you very much for your assistance.
@hicofeng When the model is deployed to the server machine and provided with a URL, it can achieve streaming output to avoid user waiting. The functionality provided here is to invoke the deployed model when there are no matching results in the cached data, referring to the OpenAI specification, which may vary depending on the specific model and deployment method used.