mosec icon indicating copy to clipboard operation
mosec copied to clipboard

feat: support server side response cache

Open kemingy opened this issue 2 years ago • 6 comments

Describe the feature

refer to:

  • https://github.com/triton-inference-server/server/blob/main/docs/user_guide/response_cache.md

Some ML models might benefit from the cache.

As for the storage part, I think ideally we should support both local and remote cache.

Why do you need this feature?

No response

Additional context

No response

kemingy avatar Jun 21 '23 03:06 kemingy

Hey Keming, interested in taking a look at this issue, I briefly looked into some rust crates for this feature and found this crate. This crate seems to have support for redis cache, sized cache and timed cache (although i dont believe they have timed + sized cache). My first thought would be to add an axum middleware to handling the caching logic. What are your thoughts on this?

AlexXi19 avatar Jun 24 '23 22:06 AlexXi19

Hey Keming, interested in taking a look at this issue, I briefly looked into some rust crates for this feature and found this crate. This crate seems to have support for redis cache, sized cache and timed cache (although i dont believe they have timed + sized cache). My first thought would be to add an axum middleware to handling the caching logic. What are your thoughts on this?

I think this PR should come with a benchmark. I don't know if this lib fits our requirements.

  • multi routes
  • local & remote cache
  • cache TTL
  • cache size limit

I don't know how it handles the cache key. Since the key/value could be a huge image (like 3 x 1000 x 1000 f32). The benchmark should include different key/value types like a simple string, an image, an embedding, etc.

kemingy avatar Jun 26 '23 08:06 kemingy

Good point. Do you think the cache should be aware of the exact content type?

AlexXi19 avatar Jun 26 '23 16:06 AlexXi19

Good point. Do you think the cache should be aware of the exact content type?

No. Because we don't really parse the HTTP request body on the Rust side. I list different types of data just because their sizes are different.

kemingy avatar Jun 26 '23 16:06 kemingy

For the benchmark, you can check https://github.com/tensorchord/inference-benchmark/tree/main/benchmark

kemingy avatar Jun 28 '23 10:06 kemingy