add LLM in adapter and save query and answer

Open DreamCyc opened this issue 11 months ago • 2 comments

Dec 18 '24 23:12 DreamCyc

Does the current LLM (Large Language Model) adapter for this project support streaming answers? For scenarios that require low latency, is there a plan to support this feature in the future if it's not available now? Thank you very much for your assistance.

Dec 19 '24 16:12 hicofeng

@hicofeng When the model is deployed to the server machine and provided with a URL, it can achieve streaming output to avoid user waiting. The functionality provided here is to invoke the deployed model when there are no matching results in the cached data, referring to the OpenAI specification, which may vary depending on the specific model and deployment method used.

Dec 22 '24 12:12 DreamCyc