http-cache icon indicating copy to clipboard operation
http-cache copied to clipboard

Enhancement: Store Deserialized HttpResponse for Improved Performance

Open tusharmath opened this issue 2 years ago • 11 comments

Enhancement: Store Deserialized HttpResponse for Improved Performance

To enhance the performance of the HTTP cache library, consider adding the capability to store the deserialized version of the HttpResponse. This would help developers in avoiding repeated deserialization after retrieving from the cache.

  • Current Behavior:

    • When retrieving a cached HttpResponse, developers need to deserialize the response every time.
    • This leads to repetitive and unnecessary processing, especially for frequently accessed responses.
  • Proposed Enhancement:

    • Alongside the serialized response, store a deserialized version.
    • When the cache is accessed, provide an option to retrieve the already deserialized HttpResponse.

Benefits:

  • Performance: Faster retrieval times for cached responses, especially for applications where deserialization time is significant.
  • Ease of Use: Developers can bypass the deserialization step, simplifying their code.

Considerations:

  • Memory Usage: Storing deserialized versions might increase memory usage. This needs to be benchmarked and considered during implementation.
  • Cache Invalidation: Ensure that both serialized and deserialized versions are invalidated concurrently to avoid data mismatches.

Let me know what you think?

tusharmath avatar Aug 22 '23 14:08 tusharmath

I would be interested in seeing some benchmarks with how things are now as well as when storing the deserialized response. When I have some time I can look into adding those tests, unless that's something you would feel comfortable in submitting.

06chaynes avatar Aug 23 '23 21:08 06chaynes

Happy to share some internal insights on this — In our benchmarking, if we use a custom cache, that caches deserialized responses, we easily get around 30% improvement in performance whenever there is a cache hit scenario. For context, we are building a graphql-proxy layer that hits this API https://jsonplaceholder.typicode.com/posts.

This is a simple API, and not a very heavy one. The penalty of not having such caching increases linearly with increase in payload size.

Search for deserialize in the below flamegraph. flamegraph

BTW - This is a fantastic library and we love how performant it is in it's default tuning.

tusharmath avatar Aug 24 '23 07:08 tusharmath

Thanks, glad you found it of some use!

Thanks for sharing your findings, and yeah that is a fairly large increase in performance. I will hopefully have some time soon(ish) to look into this further.

06chaynes avatar Aug 24 '23 14:08 06chaynes

@tusharmath So I was considering ways this might be accomplished and I think one possible route is by implementing a custom CacheManager that doesn't serialize things before storing them (and ofc wouldn't need to deserialize it coming out).

Playing around I did create a version of this for Moka that at least works in the test I wrote for it. MokaManagerDeserialized

I don't have any benchmarks written so I don't know how this compares to what exists but if you'd like to try it out feel free to see if you notice any improvements.

Edit: Where you would use MokaManager::default() you would use MokaManagerDeserialized::default().

06chaynes avatar Aug 25 '23 00:08 06chaynes

You could add the manager implementation to your app as well, I think this should work:

use http_cache::{CacheManager, HttpResponse, Result};

use std::{fmt, sync::Arc};

use http_cache_semantics::CachePolicy;
use moka::future::{Cache, ConcurrentCacheExt};

#[derive(Clone)]
pub struct MokaManagerDeserialized {
    pub cache: Arc<Cache<String, Store>>,
}

impl fmt::Debug for MokaManagerDeserialized {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        f.debug_struct("MokaManagerDeserialized").finish_non_exhaustive()
    }
}

impl Default for MokaManagerDeserialized {
    fn default() -> Self {
        Self::new(Cache::new(42))
    }
}

#[derive(Clone, Debug)]
pub struct Store {
    response: HttpResponse,
    policy: CachePolicy,
}

impl MokaManagerDeserialized {
    pub fn new(cache: Cache<String, Store>) -> Self {
        Self { cache: Arc::new(cache) }
    }
    pub async fn clear(&self) -> Result<()> {
        self.cache.invalidate_all();
        self.cache.sync();
        Ok(())
    }
}

#[async_trait::async_trait]
impl CacheManager for MokaManagerDeserialized {
    async fn get(
        &self,
        cache_key: &str,
    ) -> Result<Option<(HttpResponse, CachePolicy)>> {
        let store: Store = match self.cache.get(cache_key) {
            Some(d) => d,
            None => return Ok(None),
        };
        Ok(Some((store.response, store.policy)))
    }

    async fn put(
        &self,
        cache_key: String,
        response: HttpResponse,
        policy: CachePolicy,
    ) -> Result<HttpResponse> {
        let store = Store { response: response.clone(), policy };
        self.cache.insert(cache_key, store).await;
        self.cache.sync();
        Ok(response)
    }

    async fn delete(&self, cache_key: &str) -> Result<()> {
        self.cache.invalidate(cache_key).await;
        self.cache.sync();
        Ok(())
    }
}

06chaynes avatar Aug 25 '23 01:08 06chaynes

@tusharmath went ahead and made this its own crate for now. Will publish the first release soon.

You should be able to use it as a drop in replacement for the moka manager that comes with the core library.

http-cache-reqwest = { version = "0.11.2", default-features = false }
http-cache-mokadeser = "0.1.0"
use http_cache_reqwest::{Cache, CacheMode, HttpCache, HttpCacheOptions};
use http_cache_mokadeser::MokaManager;

client = client.with(Cache(HttpCache {
        mode: CacheMode::Default,
        manager: MokaManager::default(),
        options: HttpCacheOptions::default(),
      }))

06chaynes avatar Sep 26 '23 21:09 06chaynes

Thanks @06chaynes ! I will try it and compare performance. However from first impressions we still need to deserialize the response everytime. That is bit expensive. I think what we are saving here is the serialisation costs.

tusharmath avatar Sep 27 '23 02:09 tusharmath

@tusharmath Sounds good! Though I am curious where you see this occurring, I might be misunderstanding or missing something so just want to be sure.

06chaynes avatar Sep 27 '23 02:09 06chaynes

The Store keeps the complete HttpResponse:

pub struct Store {
    response: HttpResponse,
    policy: CachePolicy,
}

This means when I try to get the HttpResponse, I end up parsing the body again and convert bytes to a serde_json::Value. This happens over an over again and resulting in degradation in performance. Instead if the store had a capability to keep a pre-deserialized json response it would only deserialize it once and store it in the cache.

tusharmath avatar Sep 27 '23 06:09 tusharmath

@tusharmath The middleware will need to return the response as an HttpReponse for me to work with, as of right now I don't have a way around this. I'm open to suggestions on this but right now I can't picture a clear path forward for the requested changes.

06chaynes avatar Sep 27 '23 13:09 06chaynes

Revisiting the internals of the project after some of the latest stabilized updates to the language, and I think one of the changes I could make is getting rid of the HttpResponse struct entirely and therefore the conversion. I don't have a great deal of free time to work on things right now but I should be able to make some progress towards this eventually.

06chaynes avatar Jan 23 '24 22:01 06chaynes