rails icon indicating copy to clipboard operation
rails copied to clipboard

Rails.cache.fetch unable to retrieve previous cached value if expired

Open poloka opened this issue 2 years ago • 18 comments

Steps to reproduce

key = 'foo'
args = { expires_in: 10 }
Rails.cache.fetch(key, args) { 'bing' }
sleep(10)
entry = Rails.cache.send(:read_entry, key, args)
begin
  Rails.cache.fetch(key, expires_in: 10) { raise 'error' }
rescue StandardError
  # nice to have to reset original cache value
  Rails.cache.write(key, entry.value, args) if entry&.expired? # write back into the cache the original value and reset cache 
  entry&.value
end

Problem statement

If an unexpected error occurs within the yield block say to a service call being down for a short period, it would be nice to provide the option to retrieve the existing cached value and re inject the value back into the cache to check again when the new cache period expires. This would allow servers to continue to function with a value rather than crashing because a consuming service went down for retrieving updates making my server more resilient to other consuming service problems.

Actual behavior

If the yield block fails, the previous cached value is deleted and no longer accessible after the fact.

Enhancement request

My suggestion is to move read_entry to be publicly accessible allowing consumers to first hold the previous result prior to calling the fetch. If the fetch yields an error, the consumer can then catch their acceptable errors, determine their acceptable retries for exponential backoff, and then write back into the cache their wait period for attempting the 'refresh' of their cached content.

System configuration

Rails version: 5.2 or greater Ruby version: 2.6 or greater

poloka avatar Jun 09 '22 17:06 poloka

Hmm, I'm not sure I follow your example. I assume the sleep(10) indicates the boundary between two requests. The first request is above the sleep and that's when the cache value is written. The second request is after the sleep and that's when the cache read fails. But if that's the case how will read_entry work across requests?

ghiculescu avatar Jun 09 '22 22:06 ghiculescu

Hmm, I'm not sure I follow your example. I assume the sleep(10) indicates the boundary between two requests. The first request is above the sleep and that's when the cache value is written. The second request is after the sleep and that's when the cache read fails. But if that's the case how will read_entry work across requests?

The intent would be to get the entry just prior to the 'begin' block and just use its value inside the 'rescue'. I just didn't add it a second time.

I just updated it to make more sense in the flow.

poloka avatar Jun 09 '22 23:06 poloka

But if you can get its entry just before the begin block doesn’t that mean it hasn’t expired? In which case fetch would not call its block, as it can also get the entry?

ghiculescu avatar Jun 09 '22 23:06 ghiculescu

Nope. The read_entry retrieves the value stored in the cache regardless if its expired. You have a way to check the expiration by executing .expired? on the returned object. The implementation of the fetch deletes the entry after it's usage of read_entry and prior to yielding the block if it's expired so you have no way to get the value after the fact. The implementation calls a method called handle_expired_entry which will delete it from the cache when expired. So the only way to get the previous value is to retrieve beforehand to account for the failure.

poloka avatar Jun 09 '22 23:06 poloka

Not sure what you are trying to do, but it looks a lot like the race_condition_ttl options https://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch

byroot avatar Jun 13 '22 18:06 byroot

Not sure what you are trying to do, but it looks a lot like the race_condition_ttl options https://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch

As discussions are also occurring on the PR, this does not work. See my following reply: https://github.com/rails/rails/pull/45313#issuecomment-1152422487

If an error occurs inside the yield block the opportunity to retrieve the previous value is lost. The previous 'entry' is deleted prior to performing the execution of the yield block so I cannot retrieve it after the fact.

poloka avatar Jun 13 '22 19:06 poloka

Ok, I didn't see the PR. I skimmed it and still don't understand your use case.

But anyway making Entry and the related methods public API is not desirable at all. I think it would help if you gave a more concrete example of what you are trying to do.

byroot avatar Jun 13 '22 19:06 byroot

@byroot I have a very concrete example provide in my steps to reproduce in the opening of this discussion. My use case is simple in that when I am trying to perform my external service retrieval of a new value to cache (yield block) I am experiencing an error in that retrieval and I am unable to get a new value. When that error occurs, I want my consumers to be unaware that a downline service call failed and continue working with the previous cached value. So what I want to do is return that previous cached value and just try hitting the downstream service at a later point when it may or may not be back up. The way to accomplish is to simply place the previous entry's value back into the cache for a designated period of time and all the normal flow to simply retry next time when the cache once again expires.

poloka avatar Jun 13 '22 19:06 poloka

Ok, that makes more sense.

What I propose it to either modify or add an option to fetch so that if the block fails the value is put back. I think that's what race_condition_ttl should do, if it doesn't we can change it.

But Entry shouldn't be exposed.

byroot avatar Jun 13 '22 19:06 byroot

Cool, thanks @byroot. So, I added in my example the race_condition_ttl just to test the current result of the existing functionality and the results do not allow for serving the previous result. Here is my example code:

key = 'foo'
args = { expires_in: 10, race_condition_ttl: 60 }
Rails.cache.fetch(key, args) { 'bing' }
sleep(10)
# entry = Rails.cache.send(:read_entry, key, args)
begin
  value = Rails.cache.fetch(key, args) { raise 'error' }
  puts "I made it here with #{value}"
rescue StandardError
  puts "I landed in the exception case"
end

The race_condition_ttl does as you suggest in that it retains the cached value internally and does not delete it initially. I can see that by performing the Rails.cache.send(:read_entry, key, args) for validation purposes while inside my rescue block. However, there is absolutely no way to retrieve that previous value from the cache once my yield block has errored out and I fall into the rescue block. I don't see the I made it here execution.

If the Entry is not to be exposed then there would have to be another way to auto inject the existing value back into the cache for another cache period or provide the value and a callback maybe in maybe a form of an exception to catch to allow the consumer another way to handle the retry period. But even in the latter, the raised exception would have to include the previous value to allow the consumer to Rails.cache.write the value back into the cache for whatever designated period they deem acceptable. Maybe its new cache options to be provided to the fetch for auto_inject_on_error as a boolean and allow_handle_block_retry as a boolean. The former would have to provide logging indicating there was an error in the yield block and the cache entry reinjected because it would be hard for a consumer to understand why their value is not refreshing if there was a swallowed exception cache that was handling this behind the scenes. The latter would at least allow consumers with defining the reinjection parameters and if they wanted to log information they could.

poloka avatar Jun 13 '22 20:06 poloka

Ok, I think I see where the confusion is. race_condition_ttl is for other processes to read the stale value while you regenerate it. You wish to return the stale value from that same process when you fail to regenerate it as a fallback. Is that correct?

I think it also somewhat relate to https://github.com/rails/rails/pull/41344.

What about Rails.cache.read("key", stale: true) ?

begin
  value = Rails.cache.fetch(key, args) { raise 'error' }
  puts "I made it here with #{value}"
rescue StandardError
  Rails.cache.read(key, stale: true)
end

It means two reads instead of one, but as a fallback mechanism it's probably ok.

byroot avatar Jun 13 '22 20:06 byroot

You are correct that I have a single process that wants to read and then regenerate if it fails as a fallback. I've wondered how the race_condition_ttl really works because yes even though its expired, every process that attempts to fetch the expired value could fall into its yield block regardless and attempt to perform the logic inside that yield block. So say you have three processes that all access the cache via the fetch at the exact same time all three will attempt the yield and the last successful execution wins to update the cache. I do see based on the documentation that if the first process attempts the fetch prior to anyone else it should update the cache with extending the expires_in TTL for a short period of time and allow other threads to continue until the original thread updates the cache. This only temporarily prolongs the immediate issue that if the yield block failed and after that short extended TTL the cache value will be lost and all processes will now be attempting to retrieve the new entry to cache. I do question if this race condition should just be baked into the cache's implementation because I want to ask who wouldn't want this type of functionality in the first place? Or at least a default value of say 10 seconds where I would hope most refreshing of caches wouldn't take that long.

Let me give you the real world scenario I ran into. I had a cache set to four hours for retrieving some data. The external service I attempted to call went down for over 6 hours. After all my 500+ rails applications with their multiple processes expired their cache entries, they were all absolutely bombarding the dead service. Once that service tried to come back online, I had so many services attempting to retrieve that new value to cache that it DDoS'ed the starting service and essentially took it back down.

What I wanted to solve is that if a service goes down, my service can run indefinitely with cached results and at the cache TTL interval attempt to refresh its data. What I have implemented as a workaround for now is a dual cache system where I have a short-term and long-term cache that if I lose my short-term cached value I pull from the long-term and stuff it back into short-term and reset the long-term TTL. This way I solve if that service went down for 6 hours that all my rails applications continue to work without interruption. Now I understand if any service was restarted and lost the cache that it would fail and that is expected. Maybe at that point I should look into file_store if that would help with restarting services; however, in a container world a new container would not share files so I'm back to square one.

I am only looking to mitigate the situation for the population that knows of the existing value and keep them continuing as if nothing is wrong.

poloka avatar Jun 13 '22 21:06 poloka

I do question if this race condition should just be baked into the cache's implementation

I don't think so. It's hard for a framework to assume that X seconds stale responses are acceptable for the users. That's extremely use case dependent.

That said, unless I'm mistaken all options can be passed to the constructor, so you can configure your cache with a default value to race_condition_ttl, which is pretty much what you suggest, but opt-in.

byroot avatar Jun 13 '22 22:06 byroot

Maybe it will be easier for you in this case not to use caching, but retrieve the values from 3rd party, store them in redis indefinitely and use where you need it. And just periodically refresh it.

fatkodima avatar Jun 15 '22 00:06 fatkodima

@fatkodima but then I would have to call another external service to retrieve that data and then cache it locally so I don't waste processing time by making that external call. I just changed the potential point of failure and the issue could still arise if now my redis service goes down. If you're implying to just start a redis instance locally on the same server it just seems like a lot of overhead (and expensive $$) for something that could be simply baked into the existing cache design.

poloka avatar Jun 17 '22 17:06 poloka

I believe @fatkodima is saying just use Redis on your own, but don't set any expirations on the keys.

Redis.current.hmset('cachedthing', 'value', 'This is my cached value', 'at', Time.now.to_f)

result = Redis.current.hgetall('cachedthing')
#=> {"value"=>"This is my cached value", "at"=>"1655534182.348695"}

if !result['at'] || Time.at(result['at'].to_f) < 10.minutes.ago
  # enqueue an ActiveJob to fetch the new value and hmset it
end

return result['value']

natematykiewicz avatar Jun 18 '22 06:06 natematykiewicz

Wait, what am I saying? You'd just need to use Rails Cache without an expiration:

value, cached_at = Rails.cache.fetch('cachedthing') do # don't set expires_in: 10.minutes here
  ['This is my cached value', Time.now]
end

if !cached_at || cached_at < 10.minutes.ago
  # enqueue an ActiveJob to fetch the new value and write it into the cache
end

return value

Alternatively, you could do:

Rails.cache.fetch('cachedthing') do # don't set expires_in: 10.minutes here
  MyCacheJob.set(wait: 10.minutes).perform_later # enqueue a job to rebuild the cache in 10 minutes
  'This is my cached value'
end

natematykiewicz avatar Jun 18 '22 06:06 natematykiewicz

This issue has been automatically marked as stale because it has not been commented on for at least three months. The resources of the Rails team are limited, and so we are asking for your help. If you can still reproduce this error on the 7-0-stable branch or on main, please reply with all of the information you have about it in order to keep the issue open. Thank you for all your contributions.

rails-bot[bot] avatar Sep 16 '22 07:09 rails-bot[bot]