Remove SC fence in try_advance
Since an SC fence is issued when obtaining the guard, try_advance doesn't need to issue another fence ~~if it uses the guard's local epoch for advancing the global epoch~~.
cargo bench results:
-
master
test multi_alloc_defer_free ... bench: 2,961,160 ns/iter (+/- 94,133) test multi_defer ... bench: 1,641,059 ns/iter (+/- 23,348) test single_alloc_defer_free ... bench: 39 ns/iter (+/- 0) test single_defer ... bench: 11 ns/iter (+/- 0) test multi_flush ... bench: 20,732,241 ns/iter (+/- 264,081) test single_flush ... bench: 95 ns/iter (+/- 0) test multi_pin ... bench: 3,374,234 ns/iter (+/- 111,429) test single_pin ... bench: 5 ns/iter (+/- 0)
-
this patch
test multi_alloc_defer_free ... bench: 2,962,290 ns/iter (+/- 100,018) test multi_defer ... bench: 1,642,319 ns/iter (+/- 30,912) test single_alloc_defer_free ... bench: 39 ns/iter (+/- 0) test single_defer ... bench: 12 ns/iter (+/- 0) test multi_flush ... bench: 19,506,794 ns/iter (+/- 323,483) test single_flush ... bench: 76 ns/iter (+/- 1) test multi_pin ... bench: 3,289,903 ns/iter (+/- 70,836) test single_pin ... bench: 5 ns/iter (+/- 0)
@tomtomjhj would you please look at the CI failures?
The first one is a rust edition issue (old rustc doesn't support 2021 edition?) and the other is clippy lint for crossbeam-utils. I think it's better to address these in another PR.
CI failure has been fixed in #758.
pin_holds_advance test fails when I add artificial delay here (std::thread::sleep(std::time::Duration::from_micros(1))):
https://github.com/crossbeam-rs/crossbeam/blob/858bc6e03e1a688368262a4f43e8e35db69ac522/crossbeam-epoch/src/internal.rs#L466-L468
This happens when the global epoch decreases, which is in contrary to this comment: https://github.com/crossbeam-rs/crossbeam/blob/858bc6e03e1a688368262a4f43e8e35db69ac522/crossbeam-epoch/src/internal.rs#L338-L343
So crossbeam is relying on the additional global epoch load and fence in try_advance for the monotonicity of global epoch. This is an important assumption in Jeehoon's proof. However, it seems that this point is not proved in that document, and I couldn't figure out why that works.
A relatively simple solution for enforcing monotonicity is to update the global epoch with a (relaxed) CAS instead of store.