stretto icon indicating copy to clipboard operation
stretto copied to clipboard

Getting `fail to send msg to channel: cache set buf sender: channel closed` on inserts

Open sebasmagri opened this issue 2 years ago • 14 comments

Under some pressure, we start seeing this and the cache stops adding new entries. Any guidelines on troubleshooting this?

sebasmagri avatar Apr 07 '22 17:04 sebasmagri

Hi, could you provide some example code for me to reproduce?

al8n avatar Apr 13 '22 04:04 al8n

@al8n it's hard to reproduce in a development environment also from me, but under real load it starts happening after some minutes. Tracking down the code it seems like the insert buffer channel gets closed at some point and then we start seeing this. I've been trying to reproduce this also using the benchmarking scripts in stretto, but no luck so far.

This is why I'd be interested on some guidance to troubleshoot this.

sebasmagri avatar Apr 13 '22 08:04 sebasmagri

@sebasmagri which one are you using, Cache or AsyncCache?

al8n avatar Apr 13 '22 10:04 al8n

AsyncCache. I think it might be related to using wait too often. I had to do that since I started seeing some issues otherwise, but now I've removed it and I still get a very high dropped sets rate.

Il mer 13 apr 2022, 12:59 Al Liu @.***> ha scritto:

@sebasmagri https://github.com/sebasmagri which one are you using, Cache or AsyncCache?

— Reply to this email directly, view it on GitHub https://github.com/al8n/stretto/issues/13#issuecomment-1097913805, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACXAJLWD6XIZNMAAMQRT3VE2SJBANCNFSM5S2D7HCQ . You are receiving this because you were mentioned.Message ID: @.***>

sebasmagri avatar Apr 13 '22 11:04 sebasmagri

I've also tried to track if .close is getting called explicitly at some point but didn't find any cases.

sebasmagri avatar Apr 13 '22 11:04 sebasmagri

For the record, the issue I get when I don't explicitly wait after inserting is that then I find the expiration map mostly already borrowed on update.

sebasmagri avatar Apr 13 '22 12:04 sebasmagri

Found something interesting using tokio-console, when this condition happens, I can only see the policy task in the tasks list. so could this confirm that AsyncCache::spawn is going down, potentially by receiving a stop event and exiting the loop?

sebasmagri avatar Apr 13 '22 17:04 sebasmagri

Yes, it could be AsyncProcessor stops. It's weird, because the only way to stop AsyncProcessor is to call close method or the reference counter of AsyncCache goes to 0. May need to add tracing as a new feature for this crate to help find possible bugs.

al8n avatar Apr 13 '22 18:04 al8n

As I'm putting the cache instance as managed state in a Rocket application, i wonder if the second case could be possible, however in such case also the policy should be dropped right?

Il mer 13 apr 2022, 20:03 Al Liu @.***> ha scritto:

Yes, it could be AsyncProcessor stops. It's weird, because the only way to stop AsyncProcessor is to call close method or the reference counter of AsyncCache goes to 0. May need to add tracing as a new feature for this crate to help find possible bugs.

— Reply to this email directly, view it on GitHub https://github.com/al8n/stretto/issues/13#issuecomment-1098335229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACXAJBFEIJS5KKHL4TGDLVE4D5LANCNFSM5S2D7HCQ . You are receiving this because you were mentioned.Message ID: @.***>

sebasmagri avatar Apr 13 '22 18:04 sebasmagri

Yes, it should be. However, it may also possible the PolicyProcessor be kept. Because in AsyncCache, the inner fields are wrapped by different Arc. It may be a bug.

al8n avatar Apr 13 '22 18:04 al8n

@al8n let me know if I can help checking somehow. We're looking forward to deploy a component to production using stretto and would like to get these issues sorted out.

sebasmagri avatar Apr 14 '22 16:04 sebasmagri

Thanks! The plan is to add the tracing stuff for the crate, especially the Processors. So that we can know when the Processor stops. Would you like to do this? Or I can start on this weekend. Once I complete, I will let you know and you can test it again.

al8n avatar Apr 14 '22 18:04 al8n

@al8n any updates on this front? I had a few days out and would like to catch up with this.

sebasmagri avatar Apr 20 '22 07:04 sebasmagri

Sorry, there is no update. I am busy with my uni study and my full-time work these days.

al8n avatar Apr 20 '22 14:04 al8n

The ExpirationMap is RwLock<RefCell<HashMap<K,V>>>, when trying to borrow mutable reference failed, the CacheProcessor will return, and finally the channel closed. Maybe you can remove the RefCell, just use RwLock's write method instead. Further more, return statement in CacheProcessor is not reasonable when error occurred. image image @sebasmagri @al8n

phantomhker avatar Apr 19 '23 07:04 phantomhker

Thanks @phantomhker! I am really silly at that moment for adding a RefCell when there is already a RwLock.

al8n avatar Apr 19 '23 07:04 al8n