rails
rails copied to clipboard
Rails.cache.fetch unable to retrieve previous cached value if expired
Steps to reproduce
key = 'foo'
args = { expires_in: 10 }
Rails.cache.fetch(key, args) { 'bing' }
sleep(10)
entry = Rails.cache.send(:read_entry, key, args)
begin
Rails.cache.fetch(key, expires_in: 10) { raise 'error' }
rescue StandardError
# nice to have to reset original cache value
Rails.cache.write(key, entry.value, args) if entry&.expired? # write back into the cache the original value and reset cache
entry&.value
end
Problem statement
If an unexpected error occurs within the yield block say to a service call being down for a short period, it would be nice to provide the option to retrieve the existing cached value and re inject the value back into the cache to check again when the new cache period expires. This would allow servers to continue to function with a value rather than crashing because a consuming service went down for retrieving updates making my server more resilient to other consuming service problems.
Actual behavior
If the yield block fails, the previous cached value is deleted and no longer accessible after the fact.
Enhancement request
My suggestion is to move read_entry
to be publicly accessible allowing consumers to first hold the previous result prior to calling the fetch
. If the fetch
yields an error, the consumer can then catch their acceptable errors, determine their acceptable retries for exponential backoff, and then write
back into the cache their wait period for attempting the 'refresh' of their cached content.
System configuration
Rails version: 5.2 or greater Ruby version: 2.6 or greater
Hmm, I'm not sure I follow your example. I assume the sleep(10)
indicates the boundary between two requests. The first request is above the sleep and that's when the cache value is written. The second request is after the sleep and that's when the cache read fails. But if that's the case how will read_entry
work across requests?
Hmm, I'm not sure I follow your example. I assume the
sleep(10)
indicates the boundary between two requests. The first request is above the sleep and that's when the cache value is written. The second request is after the sleep and that's when the cache read fails. But if that's the case how willread_entry
work across requests?
The intent would be to get the entry just prior to the 'begin' block and just use its value inside the 'rescue'. I just didn't add it a second time.
I just updated it to make more sense in the flow.
But if you can get its entry just before the begin block doesn’t that mean it hasn’t expired? In which case fetch would not call its block, as it can also get the entry?
Nope. The read_entry
retrieves the value stored in the cache regardless if its expired. You have a way to check the expiration by executing .expired?
on the returned object. The implementation of the fetch
deletes the entry after it's usage of read_entry
and prior to yielding the block if it's expired so you have no way to get the value after the fact. The implementation calls a method called handle_expired_entry
which will delete it from the cache when expired. So the only way to get the previous value is to retrieve beforehand to account for the failure.
Not sure what you are trying to do, but it looks a lot like the race_condition_ttl
options https://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch
Not sure what you are trying to do, but it looks a lot like the
race_condition_ttl
options https://api.rubyonrails.org/classes/ActiveSupport/Cache/Store.html#method-i-fetch
As discussions are also occurring on the PR, this does not work. See my following reply: https://github.com/rails/rails/pull/45313#issuecomment-1152422487
If an error occurs inside the yield block the opportunity to retrieve the previous value is lost. The previous 'entry' is deleted prior to performing the execution of the yield block so I cannot retrieve it after the fact.
Ok, I didn't see the PR. I skimmed it and still don't understand your use case.
But anyway making Entry
and the related methods public API is not desirable at all. I think it would help if you gave a more concrete example of what you are trying to do.
@byroot I have a very concrete example provide in my steps to reproduce in the opening of this discussion. My use case is simple in that when I am trying to perform my external service retrieval of a new value to cache (yield block) I am experiencing an error in that retrieval and I am unable to get a new value. When that error occurs, I want my consumers to be unaware that a downline service call failed and continue working with the previous cached value. So what I want to do is return that previous cached value and just try hitting the downstream service at a later point when it may or may not be back up. The way to accomplish is to simply place the previous entry's value back into the cache for a designated period of time and all the normal flow to simply retry next time when the cache once again expires.
Ok, that makes more sense.
What I propose it to either modify or add an option to fetch
so that if the block fails the value is put back. I think that's what race_condition_ttl
should do, if it doesn't we can change it.
But Entry
shouldn't be exposed.
Cool, thanks @byroot. So, I added in my example the race_condition_ttl
just to test the current result of the existing functionality and the results do not allow for serving the previous result. Here is my example code:
key = 'foo'
args = { expires_in: 10, race_condition_ttl: 60 }
Rails.cache.fetch(key, args) { 'bing' }
sleep(10)
# entry = Rails.cache.send(:read_entry, key, args)
begin
value = Rails.cache.fetch(key, args) { raise 'error' }
puts "I made it here with #{value}"
rescue StandardError
puts "I landed in the exception case"
end
The race_condition_ttl
does as you suggest in that it retains the cached value internally and does not delete it initially. I can see that by performing the Rails.cache.send(:read_entry, key, args)
for validation purposes while inside my rescue
block. However, there is absolutely no way to retrieve that previous value from the cache once my yield block has errored out and I fall into the rescue
block. I don't see the I made it here
execution.
If the Entry
is not to be exposed then there would have to be another way to auto inject the existing value back into the cache for another cache period or provide the value and a callback maybe in maybe a form of an exception to catch to allow the consumer another way to handle the retry period. But even in the latter, the raised exception would have to include the previous value to allow the consumer to Rails.cache.write
the value back into the cache for whatever designated period they deem acceptable. Maybe its new cache options to be provided to the fetch for auto_inject_on_error
as a boolean and allow_handle_block_retry
as a boolean. The former would have to provide logging indicating there was an error in the yield block and the cache entry reinjected because it would be hard for a consumer to understand why their value is not refreshing if there was a swallowed exception cache that was handling this behind the scenes. The latter would at least allow consumers with defining the reinjection parameters and if they wanted to log information they could.
Ok, I think I see where the confusion is. race_condition_ttl
is for other processes to read the stale value while you regenerate it. You wish to return the stale value from that same process when you fail to regenerate it as a fallback. Is that correct?
I think it also somewhat relate to https://github.com/rails/rails/pull/41344.
What about Rails.cache.read("key", stale: true)
?
begin
value = Rails.cache.fetch(key, args) { raise 'error' }
puts "I made it here with #{value}"
rescue StandardError
Rails.cache.read(key, stale: true)
end
It means two reads instead of one, but as a fallback mechanism it's probably ok.
You are correct that I have a single process that wants to read and then regenerate if it fails as a fallback. I've wondered how the race_condition_ttl
really works because yes even though its expired, every process that attempts to fetch
the expired value could fall into its yield block regardless and attempt to perform the logic inside that yield block. So say you have three processes that all access the cache via the fetch
at the exact same time all three will attempt the yield
and the last successful execution wins to update the cache. I do see based on the documentation that if the first process attempts the fetch
prior to anyone else it should update the cache with extending the expires_in
TTL for a short period of time and allow other threads to continue until the original thread updates the cache. This only temporarily prolongs the immediate issue that if the yield block failed and after that short extended TTL the cache value will be lost and all processes will now be attempting to retrieve the new entry to cache. I do question if this race condition should just be baked into the cache's implementation because I want to ask who wouldn't want this type of functionality in the first place? Or at least a default value of say 10 seconds where I would hope most refreshing of caches wouldn't take that long.
Let me give you the real world scenario I ran into. I had a cache set to four hours for retrieving some data. The external service I attempted to call went down for over 6 hours. After all my 500+ rails applications with their multiple processes expired their cache entries, they were all absolutely bombarding the dead service. Once that service tried to come back online, I had so many services attempting to retrieve that new value to cache that it DDoS'ed the starting service and essentially took it back down.
What I wanted to solve is that if a service goes down, my service can run indefinitely with cached results and at the cache TTL interval attempt to refresh its data. What I have implemented as a workaround for now is a dual cache system where I have a short-term and long-term cache that if I lose my short-term cached value I pull from the long-term and stuff it back into short-term and reset the long-term TTL. This way I solve if that service went down for 6 hours that all my rails applications continue to work without interruption. Now I understand if any service was restarted and lost the cache that it would fail and that is expected. Maybe at that point I should look into file_store
if that would help with restarting services; however, in a container world a new container would not share files so I'm back to square one.
I am only looking to mitigate the situation for the population that knows of the existing value and keep them continuing as if nothing is wrong.
I do question if this race condition should just be baked into the cache's implementation
I don't think so. It's hard for a framework to assume that X seconds stale responses are acceptable for the users. That's extremely use case dependent.
That said, unless I'm mistaken all options can be passed to the constructor, so you can configure your cache with a default value to race_condition_ttl
, which is pretty much what you suggest, but opt-in.
Maybe it will be easier for you in this case not to use caching, but retrieve the values from 3rd party, store them in redis indefinitely and use where you need it. And just periodically refresh it.
@fatkodima but then I would have to call another external service to retrieve that data and then cache it locally so I don't waste processing time by making that external call. I just changed the potential point of failure and the issue could still arise if now my redis service goes down. If you're implying to just start a redis instance locally on the same server it just seems like a lot of overhead (and expensive $$) for something that could be simply baked into the existing cache design.
I believe @fatkodima is saying just use Redis on your own, but don't set any expirations on the keys.
Redis.current.hmset('cachedthing', 'value', 'This is my cached value', 'at', Time.now.to_f)
result = Redis.current.hgetall('cachedthing')
#=> {"value"=>"This is my cached value", "at"=>"1655534182.348695"}
if !result['at'] || Time.at(result['at'].to_f) < 10.minutes.ago
# enqueue an ActiveJob to fetch the new value and hmset it
end
return result['value']
Wait, what am I saying? You'd just need to use Rails Cache without an expiration:
value, cached_at = Rails.cache.fetch('cachedthing') do # don't set expires_in: 10.minutes here
['This is my cached value', Time.now]
end
if !cached_at || cached_at < 10.minutes.ago
# enqueue an ActiveJob to fetch the new value and write it into the cache
end
return value
Alternatively, you could do:
Rails.cache.fetch('cachedthing') do # don't set expires_in: 10.minutes here
MyCacheJob.set(wait: 10.minutes).perform_later # enqueue a job to rebuild the cache in 10 minutes
'This is my cached value'
end
This issue has been automatically marked as stale because it has not been commented on for at least three months.
The resources of the Rails team are limited, and so we are asking for your help.
If you can still reproduce this error on the 7-0-stable
branch or on main
, please reply with all of the information you have about it in order to keep the issue open.
Thank you for all your contributions.