ocaml
ocaml copied to clipboard
Simplifications and fixes to multicore systhreads implementation (3/3)
Follow-up to #11271, #11385 and #11393 (cc @Engil, @kayceesrk).
Targets 5.1 (does not seem to fix important bugs). Details in the individual commit logs. Can you please have a look at the commit that removes dead code, in case this was unintended and hides a bug? I also added a FIXME regarding a behaviour from old systhreads that is not yet implemented in multicore (see comments below).
This PR lives on top of #11393, so only the 5 topmost commits belong to it.
No change entry needed.
I have rebased on top of #11393, so keep in mind that only the 5 topmost commits belong to this PR.
I rebased on top of #11385 and added some new minor fixes and clarifications. It still only contains less important fixes targeting 5.1. For clarity I convert it as a draft, we should review it when earlier PRs have been merged.
I added some more FIXMEs that I think are worth looking into.
(of course a round of precheck is welcome)
In case anyone wonders (I just did), the current "PR path" appears to be:
- already merged:
- #11271, which was then reverted
- #11390
- #11403
- next to review:
- #11385
- further PRs:
- #11386 (the present PR)
This should be rebased and made ready for review. While adapting to the new way systhread is initialized on every thread, I stumbled into a rabbit hole regarding error handling inside Domain.spawn and found other bugs with signals. I am not yet sure how to go about it, I just wanted to post an update.
I have rebased on top of trunk. This could do with a round of review. Instructions for reviewers:
- This is best reviewed commit-per-commit, with the help of additional detail in each commit log.
- If someone knows systhread well, please review that commit "Remove dead code" does not reveal a regression in functionality compared to OCaml 4 (code that became dead for wrong reasons).
- A few FIXMEs are added, coming from my review of the OCaml 5 systhread implem. and are about possible regressions wrt. OCaml 4. I do not intend or expect to have to fix these FIXMEs in this PR.
- Some FIXME concern possible changes/regressions wrt. the "best effort" to support
forkon a single domain. - One FIXME is about the delay for joining the tick threads on shutdown (25ms on average). On OCaml 4 there was no wait. When reviewing the code, it appeared to me that a solution could be to interrupt the
sleepsystem call in the tick thread using the self-pipe trick withpselect(edit: or simply usingpthread_cond_timedwait).
- Some FIXME concern possible changes/regressions wrt. the "best effort" to support
- One FIXME is added to avoid delaying this PR further: the FIXME in
domain_thread_funcis about proper error handling inside domain initialization. Domain initialization was moved from OCaml to C code at the end of 5.0 development, which has created new issues. Errors during domain initialization should be converted into exceptions; currently this probably crashes or deadlocks. As said in my previous comment, fixing this has revealed to be a bit of a rabbit hole, it is doable but is better done separately in order to make progress on this PR. However, reviewers are free to disagree and refuse the addition of a FIXME comment in the code (in which case we can discuss solutions here).
Do we have a volunteer to review this PR? I have assigned myself as "person in charge of making sure we don't forget about it completely", but I would be happy to let someone else do the review.
I'm starting to review this PR, but I have a couple of difficulties in the way this PR and others like #11307 are organised which is making it harder for me to review the PR(s).
My main difficulty is that the PRs suggest many improvements that seem independent but are organised as several commits, each of which has to be read one after the other. The details of the improvements are in the commit messages. This has several downsides that make it harder to review such PRs.
- On Github, the discussions on several unrelated improvements get mixed together in the same PR thread which makes it harder to follow along.
- If some of the early improvements are OK, but the follow-ups need changes, the whole PR gets blocked. As a reviewer, I am not able to checkpoint my approval on individual commits. I'm forced to read the entire commit chain in order to see how these interact with each other.
- IINM, this is also different from how OCaml compiler development operates today, which is a PR for every improvement that's independent.
Personally, I really find it hard to review PRs commit by commit.
May I recommend @gadmm to consider having separate PRs for each improvement so that it makes reviewing easier? Small improvements will go much faster with a PR per commit. If they are inter-related, please mention that it builds on top of another PR. Also, possibly consider marking the PRs that are targets in a dependence chain as drafts so that reviewer time is not wasted on reviewing PRs which may change based on whether the source PRs require changes. Reviewer time is precious and limited, and it would help if the PRs are organised to be reviewer friendly.
Except for this one comment on the second commit (https://github.com/ocaml/ocaml/pull/11386/commits/142beb885999e07d5273fb8958811e5c077b0e9b#r1207685036), the other changes look fine to me.
@gadmm it looks like the present PR has been reviewed three weeks ago, and would be mergeable in trunk after you rebase and take review comments into account. Are you available to do this in the next few weeks? If not, would you prefer if a maintainer took care of the rebase?
As I understand it, this is not scheduled for 5.1 so there is no rush.
As I understand it, this is not scheduled for 5.1 so there is no rush.
It would be useful to get this towards a merge soon as the code has already gone out of sync with trunk and needs to be rebased. Merging it faster would mean that fixing conflicting changes would be the responsibility of folks who propose newer PRs ;-)
Thanks for the gentle pressure, I am well aware and this has been at the top of my pile. As you can guess my own time is constrained, and I also received many solicitations from the OCaml github concurrently. I have been prioritizing 5.1. (Even though the release schedule is kept private for some odd reason, I have some sense that 5.1 is the priority at this time.)
Still on the topic of what we could do better to make sure that aging PRs end up getting merged, would monthly reminders be received positively? Would they help?
The "fixup" comments are for reviewing my fixes. They will be squashed later.
Except for this one comment on the second commit (https://github.com/ocaml/ocaml/commit/142beb885999e07d5273fb8958811e5c077b0e9b#r1207685036), the other changes look fine to me.
Force push lost my comment ?! :-( I don't know what my comment was, but I am assuming that it was fixed. If @gadmm can confirm, I think the PR is good to go.
The comment is still there right above (https://github.com/ocaml/ocaml/pull/11386#discussion_r1207685036), along with my reply, but it looks like your pointer was invalidated. (I got the still-valid link by doing "copy link" directly from the PR discussion, which seems to give a different URL than yours.)
Let me know about the last point, and then I will clean-up the PR.
Let me know about the last point, and then I will clean-up the PR.
LGTM. Please also add a changes entry. I'll aim to merge this right after that.
Rebased & added a Changes entry (mostly to credit the work done by reviewers, as no change entry was needed for these minor changes & fixes IMO).
Edit: as already mentioned in a different discussion, the general advice on the organisation of this PR, which mentions various points that were already applied, and did not seem related to the delay in starting the review, ended up discussed in private.