cs431 icon indicating copy to clipboard operation
cs431 copied to clipboard

[Question] epoch + 3 or epoch + 2?

Open ming535 opened this issue 3 years ago • 5 comments

Hi, I was studying the lecture on epoch based garbage collection, the lecture proves that when retire an object at epoch E, it is safe to free the object at E + 3 since the two "happens before" releation.

Screen Shot 2022-03-11 at 15 48 14

And I was also looking into the code of crossebeam-epoch, it seems that crossbeam-epoch has used E + 3 and reverts it back to E + 2:

  • E + 3 https://github.com/crossbeam-rs/crossbeam/issues/238
  • E + 3 https://github.com/crossbeam-rs/crossbeam/pull/416
  • E + 2 https://github.com/crossbeam-rs/crossbeam/pull/517

I am not sure if this is the right place to ask, but I am confused that crossbeam-epoch reverted back to E + 2.

ming535 avatar Mar 11 '22 07:03 ming535

Looking into this rfc https://github.com/crossbeam-rs/rfcs/blob/master/text/2017-07-23-relaxed-memory.md and the code of the pr carefully, I think the essential difference is the remove of SC fence in unlink/push_bag.

ming535 avatar Mar 15 '22 23:03 ming535

Hi, sorry for late reply.

The essential difference between E+2 and E+3 is that the epoch consensus rule (concurrent epochs may differ by at most 1) doesn't hold in E+2.

Note that in pin, there can be some delay between loading the global epoch (loading global epoch is essentially an optimization for checking all the other thread's local epochs) and storing the local epoch. During this interval, the global epoch can increase multiple times without considering the thread currently being pinned, resulting in 'local epoch < global epoch - 1'. Therefore, if retire tags the garbage with the local epoch, the garbage might be considered immediately expired. E+2 fixes this issue by tagging garbage with global epoch (and SC fence).

On the other hand, pin in E+3 checks that the stored local epoch is not stale (note that this is quite similar to the validation loop in hazard pointers). This enforces the epoch consensus rule. So retire can tag the garbage with the local epoch instead of the global epoch, and no additional synchronization is needed.

The advantage of E+3 is simplicity. As you can see from the slide, its correctness is very intuitive. On the other hand, correctness proof for E+2 needs a bit more involved reasoning as described in Jeehoon's RFC. However, E+3's simplicity comes at the cost of making pin no longer wait-free due to the validation loop.

Then why revert E+3? It caused random segfaults in CI which IIRC weren't reproducible on our machines, and we couldn't figure out why.

tomtomjhj avatar Mar 16 '22 01:03 tomtomjhj

oh actually it's reproducible in my laptop (Intel). ~~It seems it's not reproducible only in AMD machines.~~

tomtomjhj avatar Mar 16 '22 01:03 tomtomjhj

https://github.com/tomtomjhj/crossbeam/commit/4522ab0db5bc43e106b6143b2933760cf20d6c6f seems to fix the issue.

tomtomjhj avatar Mar 16 '22 03:03 tomtomjhj

@tomtomjhj would you please upstream the change?

jeehoonkang avatar Sep 23 '22 10:09 jeehoonkang