Reuben Bond
Reuben Bond
@kengiczhangchao could you show me roughly what the timer task is doing? If it is running forever, that is a problem. Grains are single-threaded and cancellation in .NET is cooperative...
Do you know why you're seeing so many cache invalidations to begin with? Could you tell me more about your system?
Do you have CPU limits in Kubernetes? How many cores per node & per pod do you have? CPU limits can induce starvation. Which exact version of Orleans are you...
1900 mCPU will likely cause starvation - the way that CPU limits work in k8s is that once the limit is reached, all threads are paused until the next scheduling...
If CPU limits are not set, then it shouldn't be the issue. Average CPU usage doesn't represent spikes well, so it can be a bit deceptive, but there is a...
From those messages, we can see that the silo was evicted - but we don't know what happened on the silo itself. Do you have logs from the evicted pod?...
LocalSiloHealthMonitor is the main canary to look out for. It runs self-health checks and sounds the alarm, usually before catastrophe strikes. I strongly recommend not changing that CacheSize value -...
Decreasing cache size won't help scaling. Is your pod being evicted by k8s? 500 reminders is not much, so that should not be an issue. Settings logging to Debug can...
Are you running your containers with crash dump env vars set (make sure it's a full heap dump)? https://learn.microsoft.com/en-us/dotnet/core/diagnostics/collect-dumps-crash You may need to mount a volume so you can persist...
> The direct probes are starting to fail, but succeeds by indirect probes. Does not tell me much.. but could be an angle?: Most likely, this indicates that the silo...