http-cache icon indicating copy to clipboard operation
http-cache copied to clipboard

Caching response with streaming body when fetching large file

Open aleasims opened this issue 8 months ago • 6 comments

Thanks for the time you spent on the library first!

I'm developing a caching tool for large files over HTTP. As far as I can tell, HTTP bodies are always fully in-memory (body is Vec<u8>). Obviously it is impossible to use this when my file is over 100GB. I'm wondering is there anything I can do to somehow incorporate streaming bodies? Do you have any plans of supporting this? Are there any limitations which prevent from CacheManager::put accept HttpResponse-s with streaming bodies?

aleasims avatar Apr 09 '25 17:04 aleasims

I think the problem is we have to clone the response and that was the issue. There might be a way around this but I'm not sure what that would look like yet.

06chaynes avatar Apr 09 '25 22:04 06chaynes

@06chaynes thanks for the response!

I'm probably not getting it but why it is necessary to clone the whole response including the body? Doesn't it also introduce some performance issues?

I may be completely wrong but I've managed to find only one place in http-cache crate where response is cloned - it is in CACacheManager::put. And looks like it can be worked around by writing response to cache first and the reading it back to return.

aleasims avatar Apr 10 '25 17:04 aleasims

I might have confused this with the request being cloned, I will look into this and play around with it. In the meantime if you already have a fix in mind I'd be happy to check out a PR.

06chaynes avatar Apr 10 '25 18:04 06chaynes

@06chaynes I've opened a PR with some ideas in it:

  • #104

This is not a ready-to-go code, I've just played around with this and I'd really appreciate if you could comment on this. It would be really awesome if there will be a chance to

  • avoid loading full HTTP body into memory
  • provide streaming interfaces there

aleasims avatar Apr 12 '25 22:04 aleasims

Nice! I will absolutely check this out

06chaynes avatar Apr 13 '25 01:04 06chaynes

Tiny comment on my previous point:

I may be completely wrong but I've managed to find only one place in http-cache crate where response is cloned - it is in CACacheManager::put.

There is no need to clone the response there at all, because bincode::serialize doesn't own the data, so you could just:

diff --git a/http-cache/src/managers/cacache.rs b/http-cache/src/managers/cacache.rs
index ea2416d..96dd613 100644
--- a/http-cache/src/managers/cacache.rs
+++ b/http-cache/src/managers/cacache.rs
@@ -55,10 +55,10 @@ impl CacheManager for CACacheManager {
         response: HttpResponse,
         policy: CachePolicy,
     ) -> Result<HttpResponse> {
-        let data = Store { response: response.clone(), policy };
+        let data = Store { response, policy };
         let bytes = bincode::serialize(&data)?;
         cacache::write(&self.path, cache_key, bytes).await?;
-        Ok(response)
+        Ok(data.response)
     }
 
     async fn delete(&self, cache_key: &str) -> Result<()> {

I've created a separate PR for this cause it's a trivial small fix:

  • #105

aleasims avatar Apr 13 '25 17:04 aleasims

I have (hopefully) created a streaming implementation in #115 though I would love more testing and eyes on it as this was a first for me

06chaynes avatar Jul 25 '25 02:07 06chaynes