apify-sdk-python icon indicating copy to clipboard operation
apify-sdk-python copied to clipboard

Investigate caching options in `ApifyRequestQueueClient`

Open Pijukatel opened this issue 4 months ago • 1 comments

A few points were raised during the implementation of ApifyRequestQueueClient that were not addressed immediately, as they were more of optimization issues and did not prevent the client from being released. They are mentioned here:

  • Local cache size 1_000_000:
    • This could potentially consume all the resources. Maybe we could add dynamic resizing based on currently available resources. If we reach a certain threshold, migrate a portion of the cache to a smaller one and drop the rest.
  • Deduplication can be based on different caches:
    • It is convenient to re-use existing cache for deduplication, as we do not need to consume any new resources. On the other hand, a full request cache is overkill for deduplication, as it requires only the set of unique_keys, which is basically only the keys of the request cache. If they are independent, then the size of the request cache does not affect deduplication. On the other hand, in some scenarios, it is just duplicate information in the second cache.
  • Better utilization of the already fully hydrated requests to avoid await self.get_request(request.id) for each fetch_next_request call. This might not be possible, but investigate if there is a room for improvement.

Pijukatel avatar Aug 15 '25 12:08 Pijukatel

Improved caching was to a certain extent applied to the ApifyRequestQueueClientSimple which does not need to handle a multiple consumers scenario https://github.com/apify/apify-sdk-python/pull/573 , since it will be the default client for now, we might consider this issue finished once it is merged.

Pijukatel avatar Sep 19 '25 09:09 Pijukatel

https://github.com/apify/apify-sdk-python/pull/573

Pijukatel avatar Dec 04 '25 13:12 Pijukatel