[QUESTION] How should we debug auto-recovery errors?
Hi, we are using an in memory L1 and Redis L2 cache with FusionCache. Everything works great, however, we are getting this error in our terminal output when we restart our application.
[09:40:10 ERR] FUSION [N=FusionCache I=0361226f78fa44ba9c74b98cfedbb859] (O=0HN747TGSUJ6D K=test/OptionCategory/4): stopped auto-recovery because of an error after 1 processed items
Can't find a way to debug it and find its cause. We tried to set the logging level to debug but couldn't find relevant logs for this.
Hi @simi2525 , sorry for the delay but I just came back from (very late) summer vacations 😅
The error you are seeing is generate here (line 367).
This happens if the variable hasStopped has been set to true, which as you can see above on line 359 should log another log message with the exception.
Since you say there are no other log messages about such an error and you talked about restarting your app, my instinct tells me that it could be because of line 352, and as you can see the log level used for the OperationCanceledException is not Error but Debug (and you are probably not logging debug-level messages).
All in all it probably means that:
- you restart your app
- a restart is basically a shutdown + start
- during the shutdown, your app disposes the FusionCache instance (as it should)
- disposing FusionCache, in turn, cancels the auto-recovery processing
So, if when shutting down your app (or restarting it, since as we saw it's the same thing) there are some items in auto-reovery queue (because of some transient errors), you'll get that exception.
I'm going to think if there's a better way to handle this, to avoid surprises for other users like you... maybe just log the OperationCanceledException as Error, since it should be rare and it would not create a lot of background noise anyway?
Hope this helps, let me know!
I am experiencing the same issue, but we're getting hundreds per hour, thousands of these errors per day.
We're using Redis as a backplane and our caches are set to live very long (about a day).
It seems to be happening in all of our caches, not just a single one (we have different stores for different things)
Is there any way we can get additional information from the library about how to address?
Hi @casperOne , can you share your setup? I mean which FusionCache versions, the OS, etc. You mentioned Redis as backplane: are you also using it as the distributed cache? It may seem obvious but I'm asking because that is not always the case. Also, what do you have enabled regarding observability? I'm talking about logging, tracing, etc. Thanks.
@jodydonetti Here's our package setup:
<PackageReference Include="ZiggyCreatures.FusionCache.Backplane.StackExchangeRedis" Version="2.2.0" />
<PackageReference Include="ZiggyCreatures.FusionCache.Serialization.SystemTextJson" Version="2.2.0" />
<PackageReference Include="ZiggyCreatures.FusionCache" Version="2.4.0" />
<PackageReference Include="ZiggyCreatures.FusionCache.OpenTelemetry" Version="2.4.0" />
It occurs to me that version mismatches could be a problem here?
We're running on Windows.
Our code to configure looks like this:
// For ease-of-use.
var sc = serviceCollection;
// Add the cache.
sc
.AddFusionCache(cacheName)
.WithCacheKeyPrefix($"{environment}:{cacheName}:")
.WithRegisteredSerializer()
.WithRegisteredDistributedCache()
.WithRegisteredBackplane()
.WithDefaultEntryOptions(o => {
o.Duration = duration;
});
// Return the service collection.
return sc;
And our setup at the top level looks like this:
sc = sc.AddFusionCacheSystemTextJsonSerializer();
sc = sc.AddFusionCacheStackExchangeRedisBackplane(
o => o.Configuration = redisConnectionString
);
sc = sc.AddStackExchangeRedisCache(
o => o.Configuration = redisConnectionString
);
Happy to provide more info.
I've also tried to remove everything from Redis on restart to get it to a "clean state" but that didn't work either.
I've updated the versions to be consistent (2.4.0 all around) to no effect.
We have 50K+ log entries across all our envs today....
Hi @casperOne , can you share a couple of complete logged errors? Of course excluding any reserved info. Thanks.
@jodydonetti Thanks for the hint, I should have looked deeper into the logs. There was an issue with authenticating against Redis which we've resolved, so it was not an issue with the library at all.
Appreciate the assistance
Hi @jodydonetti , We also have same error in production. Let me share with you my configuration;
services.AddMemoryCache()
.AddFusionCache()
.WithSerializer(serializer: new FusionCacheSystemTextJsonSerializer())
.WithOptions(options =>
{
options.DistributedCacheKeyModifierMode = CacheKeyModifierMode.None;
options.CacheKeyPrefix = $"v1";
options.InternalStrings.TagCacheKeyPrefix = "_tag:";
options.DistributedCacheCircuitBreakerDuration = TimeSpan.FromMinutes(3);
})
.WithDefaultEntryOptions(new FusionCacheEntryOptions
{
AllowBackgroundDistributedCacheOperations = true,
JitterMaxDuration = TimeSpan.FromSeconds(2)
})
.WithDistributedCache(distributedCache: new RedisCache(optionsAccessor: new RedisCacheOptions()
{
Configuration = config.RedisCacheUrl
}))
.WithBackplane(new RedisBackplane(optionsAccessor: new RedisBackplaneOptions
{
Configuration = config.RedisCacheUrl
}));
And here is the error we get often in production;
FUSION [N=FusionCache I=0HNHSG078RDRL] (O=0HNHSG079APVM K=mycachekey): stopped auto-recovery because of an error after 1 processed items
I also checked the source code here and I didnt see any of the logs which was written in both of 'catch' blocks. So now there is two other possibilities if the; here or here.
What do you think? Maybe you can suggest me a way to detect my problem?
Regards