GPT-Cache for better speed, cheaper costs
When operating generative AI products, especially conversational products, we often rely heavily on LLM models. However, the time it takes to generate each sentence can be tedious for customers, and the cost incurred for each request can be detrimental to the product team.
In particular, considering various guardrail check times, the response time that customers can expect from the chatbot is often exceeded. Therefore, I believe concepts like GPT-Cache are essential. I'm curious if there are any plans to integrate this concept into the SK.
Here is good reference. https://github.com/zilliztech/GPTCache
Are you suggesting a c# port? I'm also interested in something like this. Maybe extend VolatileMemoryStore with the missing functionality?
@joowon-dm-snu , great idea! Feel free to do a PR for this.
@Kevdome3000 I'm not good at C# :) and yes i used MemoryStore to implement Cache @evchaki i open conceptual PR for this issue!
@evchaki if I would like to contribute some code could you please share the guidelines? I see that https://github.com/microsoft/semantic-kernel/pull/1143 was closed due it's something you didn't want to bring into the kernel core. Also I think I will be able to create PR for Python also. Not sure about Java (I am not programmer at the end :) )
@weldpua2008 - here is a link to our guidelines --> https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md
It'd be good to implement this feature as an event when this gets picked up.
All .Net issues prior to 1-Dec-2023 are being closed. Please re-open, if this issue is still relevant to the .Net Semantic Kernel 1.x release. In the future all issues that are inactive for more than 90 days will be labelled as 'stale' and closed 14 days later.
i think that's something interested to be ported from phyton to c#, will save a lot of cost in our projects