kanal icon indicating copy to clipboard operation
kanal copied to clipboard

Hangs on async benchmarks

Open sbarral opened this issue 1 year ago • 8 comments

kanal hangs for me 99% of the time in all async benchmarks, both with the official benchmarks and with Tachyobench.

I was initially suspecting a problem with async notifications, but since it uses 100% CPU on all threads when hanging, I start to think it might actually be due to the use of spin locks, though I don't know for sure.

sbarral avatar Oct 17 '22 16:10 sbarral

thank you for your report, whats your hardware specs?

fereidani avatar Oct 17 '22 20:10 fereidani

thank you for your report, whats your hardware specs?

I only checked on my laptop so far, a i5-7200U (kaby lake, 2/4 cores).

sbarral avatar Oct 17 '22 20:10 sbarral

which rust version? is it hang on any specific test?

fereidani avatar Oct 17 '22 20:10 fereidani

which rust version? is it hang on any specific test?

This is with rust 1.64. The MPMC, MPSC and SPSC all have this issue. Not sure about the sequential test, i can check tomorrow (I'm on mobile now).

sbarral avatar Oct 17 '22 20:10 sbarral

ok, thanks, currently i can't reproduce it on my PC and laptop.

fereidani avatar Oct 17 '22 21:10 fereidani

ok, thanks, currently i can't reproduce it on my PC and laptop.

Well, this is weird... Do you have intel hardware?

I tried it on an EC2 instance and it had the exact same problem, so the issue is probably widespread. This was an EC2 t2.micro instance (exact hardware was Intel Xeon E5-2676 v3) so you can reproduce it using a free AWS starter account.

Investigating further I noticed that tests that reserve enough capacity for all messages, such as:

run_async!("bounded_mpsc(empty)", mpsc::<BenchEmpty>(Some(MESSAGES)));`

actually run.

In general, increasing the capacity seems to decrease the probability of hanging. Also, the SPSC test sometimes works even with small capacity. I managed to sometimes get the tests running with capacities of 10 or 100, but then the perf of kanal-async is not that good compared to flume-async or async-channel, which also suggests some issues with the notification primitives.

EDIT: I see that the Cargo.toml of the benchmark does not pin the version of tokio (it probably should, to improve reproducibility). My Cargo.lock says I'm using tokio-1.21.2.

sbarral avatar Oct 18 '22 05:10 sbarral

Yes, there is UB somewhere in code and I'm investigating it, I think there is a problem with our stack borrow rules, I should fix that first, I'm gonna keep you updated to run your tests again then. Are using nightly or stable Rust?

fereidani avatar Oct 18 '22 06:10 fereidani

Yes, there is UB somewhere in code and I'm investigating it, I think there is a problem with our stack borrow rules, I should fix that first, I'm gonna keep you updated to run your tests again then. Are using nightly or stable Rust?

Using stable only.

sbarral avatar Oct 18 '22 06:10 sbarral

Could you please rerun your tests, with the latest benchmark repo and the latest Kanal repo?

fereidani avatar Oct 22 '22 20:10 fereidani

I could only test on my laptop for now, but it seems to work fine with commit #774a05f :-)

Likewise on tachyobench, so I plan to merge support for kanal once you make a release.

sbarral avatar Oct 22 '22 20:10 sbarral

Thank you!

fereidani avatar Oct 23 '22 04:10 fereidani

Sorry I write here, I could not find a way to PM you so I hope you won't mind I ask here.

I would like to update my own benchmarks for an MPSC channel I wrote, but there is no official release of Kanal that I can benchmark since the pre2 release has this issue with async. I don't want to misrepresent the potential of Kanal, so do you feel that the master branch of Kanal is suitable or would you prefer that I don't benchmark Kanal just yet?

sbarral avatar Oct 23 '22 09:10 sbarral

Thank you for your professionalism, Kanal async is not ready yet, I need to redesign some parts of it, if you like to benchmark the sync version, the master branch is ok. but if you like to run benchmarks for async, I think it will be ready in pre3.

fereidani avatar Oct 23 '22 09:10 fereidani

No problem, I understand!

sbarral avatar Oct 23 '22 10:10 sbarral