Caching response with streaming body when fetching large file
Thanks for the time you spent on the library first!
I'm developing a caching tool for large files over HTTP. As far as I can tell, HTTP bodies are always fully in-memory (body is Vec<u8>). Obviously it is impossible to use this when my file is over 100GB. I'm wondering is there anything I can do to somehow incorporate streaming bodies? Do you have any plans of supporting this? Are there any limitations which prevent from CacheManager::put accept HttpResponse-s with streaming bodies?
I think the problem is we have to clone the response and that was the issue. There might be a way around this but I'm not sure what that would look like yet.
@06chaynes thanks for the response!
I'm probably not getting it but why it is necessary to clone the whole response including the body? Doesn't it also introduce some performance issues?
I may be completely wrong but I've managed to find only one place in http-cache crate where response is cloned - it is in CACacheManager::put. And looks like it can be worked around by writing response to cache first and the reading it back to return.
I might have confused this with the request being cloned, I will look into this and play around with it. In the meantime if you already have a fix in mind I'd be happy to check out a PR.
@06chaynes I've opened a PR with some ideas in it:
- #104
This is not a ready-to-go code, I've just played around with this and I'd really appreciate if you could comment on this. It would be really awesome if there will be a chance to
- avoid loading full HTTP body into memory
- provide streaming interfaces there
Nice! I will absolutely check this out
Tiny comment on my previous point:
I may be completely wrong but I've managed to find only one place in
http-cachecrate where response is cloned - it is inCACacheManager::put.
There is no need to clone the response there at all, because bincode::serialize doesn't own the data, so you could just:
diff --git a/http-cache/src/managers/cacache.rs b/http-cache/src/managers/cacache.rs
index ea2416d..96dd613 100644
--- a/http-cache/src/managers/cacache.rs
+++ b/http-cache/src/managers/cacache.rs
@@ -55,10 +55,10 @@ impl CacheManager for CACacheManager {
response: HttpResponse,
policy: CachePolicy,
) -> Result<HttpResponse> {
- let data = Store { response: response.clone(), policy };
+ let data = Store { response, policy };
let bytes = bincode::serialize(&data)?;
cacache::write(&self.path, cache_key, bytes).await?;
- Ok(response)
+ Ok(data.response)
}
async fn delete(&self, cache_key: &str) -> Result<()> {
I've created a separate PR for this cause it's a trivial small fix:
- #105
I have (hopefully) created a streaming implementation in #115 though I would love more testing and eyes on it as this was a first for me