mps icon indicating copy to clipboard operation
mps copied to clipboard

Fix some race conditions in xca6ll/hot/amcssth.

Open gareth-rees opened this issue 2 years ago • 11 comments

  • Use memory-model-aware builtins in GCC and Clang when a memory location may be written by one thread and read by another, avoiding race conditions due to out-of-order updates on ARM.
  • Call dylan_make_wrappers while the test is still single-threaded, preventing multiple threads from racing to call it.
  • Prevent dylan_init from creating a padding object, as we must not have an exact root pointing at a padding object.

Note that this does not fix all the race conditions—there is at least one more that we have not figured out—but it reduces the frequency of failures in amcssth to less than one in a hundred on my 8-core Apple M2.

Work towards #59 (Rare failure in amcssth in hot variety on Linux)

gareth-rees avatar Dec 24 '22 15:12 gareth-rees

@rptb1 You added #59 to the "Successfully merging this pull request may close these issues" list in the GitHub interface. I've removed it again because as far as I know this pull request does not fix that issue—it fixes some of the races but not all of them.

gareth-rees avatar Jan 13 '23 20:01 gareth-rees

@rptb1 You added #59 to the "Successfully merging this pull request may close these issues" list in the GitHub interface. I've removed it again because as far as I know this pull request does not fix that issue—it fixes some of the races but not all of them.

Do you think it's better practice (in GitHub) to only link completely fixed issues using the "Development" field, and link partially fixed issues from comments? It would make sense if GitHub closes those issues automatically.

(This could affect #97 )

rptb1 avatar Jan 14 '23 10:01 rptb1

Do you think it's better practice (in GitHub) to only link completely fixed issues using the "Development" field, and link partially fixed issues from comments? It would make sense if GitHub closes those issues automatically.

GitHub automatically closes the issues in the "Development" field when the pull request is merged, so if you don't want the issue to be closed automatically then you shouldn't put it there.

gareth-rees avatar Jan 14 '23 16:01 gareth-rees

Executing proc.review.entry

  1. Applying entry.universal and entry.impl

  2. Source documents are available, and include: "Built-in Functions for Memory Model Aware Atomic Operations" (GCC), "LLVM Atomic Instructions and Concurrency Guide" (LLVM), and "Clang Language Extensions", section on "Atomic operations". Review status unknown! Two of these are nicely referenced from the code.

Entry passed.

Entry took 15 mins.

rptb1 avatar Feb 01 '23 00:02 rptb1

Executing proc.review.plan.

  1. proc.review.plan.time: Less than 100 lines have changed. The change is low risk, since it only introduces some barriers, but reviewing them will need a good understanding of the source documents, which are approx 6500 words. So 100 lines @ 10 lines/min is about 10 mins of checking the code lines, plus 50 mins for source checking, so 1 hour for checking.

  2. proc.review.plan.roles: @thejayps @rptb1 and @UNAA008 reviewing. Let's have at least two people doing proc.review.role.check.source, and one other for all the others.

  3. proc.review.plan.schedule: Review will start 2023-02-02 15:00 UTC for about 2h.

Planning took 15 mins.

rptb1 avatar Feb 01 '23 13:02 rptb1

Executing proc.review.kickoff

  1. Start 15:05.
  2. proc.review.ko.roles: @UNAA008 and @thejayps concentrate on source consistency, @rptb1 does the rest and is proc.review.role.leader.
  3. Reconvene for logging at 16:00.

Kickoff took 15 minutes.

rptb1 avatar Feb 02 '23 15:02 rptb1

Executing proc.review.log

  1. Start time 16:03.
  2. proc.review.log.sums: @UNAA008 0M, 1m. @thejayps 0M, 2m. @rptb1 2M, 3m.
  3. NI for proc.review: You're not restricted to thinking about defects introduced by a change. Find any major defects.
  4. NI for proc.review: The editor can escalate stuff and still get their change through. Review exit doesn't say "did you fix everything mentioned" it says "did you take action..."
  5. "The road to hell is paved with correctness preserving diffs."
  6. NM: @thejayps If this test does not fulfill its purpose then there isn't one that does the job.
  7. Nm: @thejayps Tests are not readily identifiable based on their location within the directory structure. You need knowledge to know what might be a test and what is actually part of the MPS library and what isn't.
  8. End time 16:45. Brainstorm at 16:55.

Logging took 42 mins.

rptb1 avatar Feb 02 '23 16:02 rptb1

Executing proc.review.brainstorm

  1. Start time 16:55.
  2. https://github.com/Ravenbrook/mps/pull/83#pullrequestreview-1281321427 point 4: @thejayps Regularly, randomly select documents, and argue for their deletion. @rptb1 I like this because it combats accretion. @rptb1 Similar: every document should have a review date, that review should include your point, and purpose checks, etc. Not just "does it work". Would need resource allocation. Regularly put @thejayps in charge of regularly doing things. :) There is a rule that we don't copy and paste code, but amcssth is totally copy-pasted. I, @rptb1 broke that and we're paying the price. How'd that happen? @UNAA008 says perhaps it was lack of rigor.
  3. https://github.com/Ravenbrook/mps/pull/83#issuecomment-1414047195 point 6: What jobs? What jobs are needed? How can we tell if any are not being done? There's a gap there. (That's just broadening the issue.) @UNAA008 Whenever you write a non-trivial function you should write test code alongside it. Connecting that code would justify that code, and we'd know there was coverage. Some sort of coverage mapping. @thejayps Tests need to be driven by requirements. There should be links between the two. @rptb1 These are breaks in the chain of justification: the "why links". Maybe we could be better at enforcing that any new stuff (or changed stuff) has an unbroken "why" chain. Especially new files.
  4. @thejayps Brainstorm doesn't seem to permit new issues to come up. We could mention this possibility in proc.review.brainstorm. Capture and move on.
  5. @UNAA008 What about stuff that occurs to me a day later? proc.review could mention that possibility. The conversation should continue, but how? Or maybe a followup meeting or one-to-ones by the leader. Or the proc.review.role.improver could ask people.

Possible new issues:

  • there's no index of tests
  • design.mps.tests doesn't talk about individual tests
  • tests designs are missing

End at 17:25. 30 mins bang on!

rptb1 avatar Feb 02 '23 17:02 rptb1

We haven't assigned proc.review.role.editor or proc.review.role.improver and @thejayps is away next week. We will perhaps do some of these as a group for training purposes and pair programming.

rptb1 avatar Feb 02 '23 17:02 rptb1

4. @thejayps Brainstorm doesn't seem to permit new issues to come up. We could mention this possibility in proc.review.brainstorm. Capture and move on.

Fixed in 8848b707e10253b5dd3b4da25972305137f4adcd

rptb1 avatar Feb 19 '23 17:02 rptb1

5. @UNAA008 What about stuff that occurs to me a day later? proc.review could mention that possibility. The conversation should continue, but how? Or maybe a followup meeting or one-to-ones by the leader. Or the proc.review.role.improver could ask people.

Comment written by @rptb1 on @thejayps session:

So this is a bit meta-circular: @thejayps asked me about improving proc.review to mention capture of brainstorming thoughts that come up later. (And his asking me is an example of that!) I'm writing this to demonstrate what I would do about that:

  1. located the relevant brainstorm (above) https://github.com/Ravenbrook/mps/pull/83#issuecomment-1414104526
  2. I checked that the review log (this pull request conversation) is still active (marked pending) and so will be seen again at some point
  3. I added my thought (this comment).
  4. The above could be summarised in proc.review as something like "If you think of something after review then append to the review log as long as it hasn't been closed, otherwise raise an issue."
  5. Done!

thejayps avatar Feb 21 '23 10:02 thejayps