coq icon indicating copy to clipboard operation
coq copied to clipboard

[CI] Better support for overlaying submodules

Open JasonGross opened this issue 1 year ago • 2 comments

The overlay infrastructure now supports declaring that the root directory of some builds is a subfolder of a given repository. This allows overlay-writers to override the repository and branch (and override the subfolder with "") to indicate that the repository should be cloned from elsewhere.

This should hopefully provide a relatively seamless experience for overlay writers who need to update submodules of projects like fiat-crypto. In service of that, we split off ci-{coqutil,kami,riscv_coq,rupicola}

Overlays: https://github.com/coq/coq/blob/master/dev/ci/user-overlays/README.md

JasonGross avatar Mar 01 '24 08:03 JasonGross

@coqbot run full ci

(@coq/coqbot-maintainers why does coqbot not automatically run full CI if I create the PR with the request: full CI label?)

Edit: Oh, did it just take a bit before it ran it?

JasonGross avatar Mar 01 '24 08:03 JasonGross

The job library:ci-fiat_crypto_legacy has failed in allow failure mode ping @JasonGross

coqbot-app[bot] avatar Mar 01 '24 08:03 coqbot-app[bot]

This is not as easy as you make it sound. We'd have to duplicate the CI infrastructure (quadratically in the depth of the dependency chain) to ensure that we don't merge any breaking changes into those branches. Whereas currently we can just rely on dependabot and local project CIs.

@andres-erbsen do you have ideas or thoughts?

I agree that the complexity here is not ideal, but it allows submodule dependencies to be mostly transparent across all projects for constant complexity overhead.

JasonGross avatar Mar 09 '24 03:03 JasonGross

I must be misunderstanding, as what @proux01 proposes sounds like the current state of affairs to me (where coq-master is called master and "whatever mechanism" is dependabot). This PR is to enable Coq CI to pass using an overlay before the bedrock2 submodule in-fiat crypto is updated by dependabot (or a Coq developer), bypassing downstream CI that uses older versions and etc. FWIW, I don't like the CI complexity either and would prefer if all these projects were a monorepo, but alas. Evaluating the concept behind this PR on its own, I am in favor on the basis that the state after this PR would be more consistent with what I think Coq developers expect from CI/overlay infrastructure than the state before.

andres-erbsen avatar Mar 09 '24 04:03 andres-erbsen

I must be misunderstanding, as what @proux01 proposes sounds like the current state of affairs to me

We don't maintain the contract that, e.g., bedrock2 master always is compatible with fiat-crypto master. If we did, we could split out the dependencies and point them at the master branches of the corresponding repos. This PR is to enable the combination of:

  • submitting overlays without waiting for a submodule bump merge
  • not enforcing the strict contract that {bedrock2,coqutil,rupicola,fiat-crypto,coq-riscv,kami} master must always all be compatible
  • ensuring that Coq's CI does not randomly break because of incompatibilities in the tip of master of {bedrock2,coqutil,rupicola,fiat-crypto,coq-riscv,kami}

JasonGross avatar Mar 09 '24 04:03 JasonGross

I need more time to look at this, but I'm principle I feel good about going the direction @JasonGross proposes.

ejgallego avatar Mar 13 '24 15:03 ejgallego

I've added this to the upcoming Coq call

JasonGross avatar Mar 14 '24 00:03 JasonGross

what @proux01 proposes sounds like the current state of affairs to me (where coq-master is called master and "whatever mechanism" is dependabot).

Unfortunately not, the main issue currently is that the bedrock branch coq-master doesn't exist, it's information (some git commit hash) is burried in some gitsubmodule file in another repo (fiat_crypto). The way Coq CI works for almost everything but fiat_crypto is the following: when we have a dependency Coq <- A <- B, we have

  • A enforces the invariant that some of its braaches, let's call it coq-master (it could be master or any other branch, we don't care) keeps compiling with Coq's master branch
  • B enforces the invariant that some of it's branches, let's call it coq-master (it could be master or any other branch, we don't care) keeps compiling with Coq's master branch and A's coq-master branch. A typical way to ensure this is to have B's coq-master branch tested in the CI for A's coq-master branch.

The coq-master branches can be master or ant other branch, in which case it is periodically merged with the master branch (for instance for each Coq release). The overlays from Coq then simply go to the coq-master branch.

proux01 avatar Mar 14 '24 12:03 proux01

Okay, I understand what difference you're getting at now -- thank you for explaining. I am in principle open to either option here, but I would first like to clarify my perspective on the current situation.

Branches "master" across all mit-plv repos (and kami's rv32i) being compatible in the same sense as you proposed for coq-master is a goal for us, and we do spent substantial effort on maintaining that invariant, including planning work around it. Most of the time, the submodule does point to the dependency's master, or a commit very close to it; the difference only becomes important to us during large-scale non-backwards-compatible refactorings in e.g. bedrock2. While I do try to land coupled changes to bedrock2 and fiat-crypto together, I think there's inherently going to be a moment where the master branches do not build together. The same is true for overlay-requiring Coq changes and external plugins such as mit-plv/rewriter.

So there's a tradeoff of two different kinds of breakage: when I work with the Coq CI, there is almost invariably a project or another that does not pin its dependency and fails to build because it has not been updated to work with changes in the dependency. The goal of the submodule pins in mit-plv repos is to avoid that scenario (both in Coq CI and our own work). However, the same pin also means that changing a dependency (e.g. coqutil) when Coq needs it to change requires a rollout to all levels of users (e.g. fiat-crypto). I personally have become used to sending 5 stacked PRs quite often, and consider it to be a much smaller burden than dealing with surprise failures due to uncontrolled dependency changes, but I can see how others can disagree here. For one, my workflow is substantially dependent on a computer where I can make -f Makefile.ci -j ci-all with impunity and a script that diffs all git repos in _build_ci. I would recommend my workflow and kinda wish that this is where Coq was headed, but if other Coq developers disagree, I do think we could convince other mit-plv project maintainers to change to a concrete plan we propose. (I'm not quite sure what it would be -- would we want bedrock2 CI to build fiat-crypto with overlays like Coq CI does for plugins?)

andres-erbsen avatar Mar 14 '24 17:03 andres-erbsen

I'm not quite sure what it would be -- would we want bedrock2 CI to build fiat-crypto with overlays like Coq CI does for plugins?

I think the most hands-off alternative here is to have a branch ("tested" / "rv32i-tested"?), and every commit to master (but not PRs) test the build with all reverse dependencies and push to the tested branch only if things build. If we want to avoid duplicating CI infrastructure we can set up all the repos to have opam files and just do opam pin . and then opam install <reverse deps>. This would result in quadratic CI overhead if combined with submodules (every project is going to build every project to update the chain of submodules), and also still results in desynchronization when trying to land coupled changes, though. I don't see a way to deal with coupled changes on CI in this style without overlays.

JasonGross avatar Mar 14 '24 18:03 JasonGross

Are you suggesting we'd also take PRs straight to tested? If not, I don't think this satisfies @proux01's request.

andres-erbsen avatar Mar 14 '24 18:03 andres-erbsen

Are you suggesting we'd also take PRs straight to tested? If not, I don't think this satisfies @proux01's request.

Ah, yeah, I missed that part. You're right, it doesn't satisfy the requirement.

JasonGross avatar Mar 14 '24 18:03 JasonGross

would we want bedrock2 CI to build fiat-crypto with overlays like Coq CI does for plugins?

From Coq point of view, you can manage your CI in every way you want, as long as you maintain the invariants that all branches registered in Coq CI keep building together.

Ah, yeah, I missed that part. You're right, it doesn't satisfy the requirement.

Sorry, would you mind explaining the use case that appears problematic? I'm a bit lost here.

proux01 avatar Mar 15 '24 09:03 proux01

The way the Bedrock2 maintainers have chosen to manage their CI is to not test against users and instead perform catch-up for non-backwards-compatible changes when bumping the bedrock2 submodule in downstream developments. If we want that to change, it is on us to have a proposal. I may be wearing two hats here, and I do think some change would likely be desirable, but I can't just go to the team and tell them they now have to figure it out. I bring up Coq overlays exactly as an example of how what you are asking for is nontrivial. This Coq PR here is a proposed solution to about the same problem, and it seems much simpler to me. I am open to alternative proposals, but to evaluate them we need enough detail to understand the impact on different contributor workflows (Coq overlays, compatible bedrock2 changes, incompatible bedrock2 changes, etc).

For example, Jason was suggesting that the coq-master branch you wish for would be automatically fast-forwarded from master whenever the invariant you envision is satisfied, but you also want mit-plv/* to accept overlay PRs to the same branch, which wouldn't be compatible with fast-forwarding.

andres-erbsen avatar Mar 15 '24 12:03 andres-erbsen

The way the Bedrock2 maintainers have chosen to manage their CI is to not test against users

Whereas Coq CI is by definition the exact contrary: testing users. Maybe the two philosophies are intrinsically incompatible?

This Coq PR here [...] seems much simpler to me.

Well, I certainly understand it feels simpler from your point of view since it transfers, at least part of, the complexity on our side. That's exactly what we are complaining about ;)

For example, Jason was suggesting that the coq-master branch you wish for would be automatically fast-forwarded from master whenever the invariant you envision is satisfied, but you also want mit-plv/* to accept overlay PRs to the same branch, which wouldn't be compatible with fast-forwarding.

Why so? It's your branch, you do whatever you want with it, as long as the invariant is maintained. Maybe you would need merge or rebase instead of just fast forwarding, but I don't see the issue.

proux01 avatar Mar 15 '24 13:03 proux01

Well, I certainly understand it feels simpler from your point of view since it transfers, at least part of, the complexity on our side. That's exactly what we are complaining about ;)

No, this is not what I mean. Obviously we should evaluate the overall complexity, combined across Coq, bedrock2, and their users. What I am saying is that I don't see how pushing this problem downstream to bedrock2 would help. In particular, I don't think we should hold up this PR in favor of a solution elsewhere that is yet to be designed.

andres-erbsen avatar Mar 15 '24 15:03 andres-erbsen

Well, I certainly understand it feels simpler from your point of view since it transfers, at least part of, the complexity on our side. That's exactly what we are complaining about ;)

No, this is not what I mean. Obviously we should evaluate the overall complexity, combined across Coq, bedrock2, and their users. What I am saying is that I don't see how pushing this problem downstream to bedrock2 would help.

To be more concrete, this PR does two things:

  1. add infrastructure for handling submodules transparently
  2. factor out the dependencies of fiat-crypto

The diff-stat of (1) is +33,-1, with an additional +21,-1 of documentation.

To implement the proposal I made above, I estimate a cost of about 300 lines (20 lines of CI + 40 lines of opam files, times 5 repositories). And it would add 10+ hours of CI time to every update, if we want to keep submodules.

The logic of this PR is not especially complex, and I don't expect it to be particular hard to maintain. @proux01 Can you be a bit more concrete about the cost that you're concerned about with this PR?


Why so? It's your branch, you do whatever you want with it, as long as the invariant is maintained. Maybe you would need merge or rebase instead of just fast forwarding, but I don't see the issue.

The issue is that the following invariants seem incompatible:

  1. Coq's CI tracks branches
  2. The collection of branches that are tracked by Coq's CI always build together / consistently
  3. Overlays are made to the branches tracked by Coq's CI
  4. Projects have a primary branch where development is done
  5. Project developers do not have to manually manage synchronization between the primary development branch and the branch tracked by Coq's CI
  6. There is no risk of merge conflicts that have to be manually resolved between the primary development branch and the branch tracked by Coq's CI

In particular, insofar as I can tell, 6 implies 7. Pull requests are only accepted against the primary development branch of the project

which is incompatible with 3.

JasonGross avatar Mar 15 '24 19:03 JasonGross

Thanks for all the feedback folks, indeed in the CI we took inspiration from the experience some large codebases (Jane Street's OCaml's libs for example) in managing their setup. The way they frame the problem is indeed similar on how you would frame the problem of consistency in a distributed system.

IIUTC, in that case you have to manage autonomy of "branches" vs "reconciliation cost". There is ample evidence that reconciliation cost tends to be exponential in the size of the divergence (in this case delta), so we favor a kind of "greedy" strategy where diverging in CI branches will imply that the people in charge of the CI development will have to bear the reconciliation cost. This has indeed some complex tradeoffs.

I'm unaware of easy solutions to this problem, but indeed, I think we can improve the status quo and evolve our model.

ejgallego avatar Mar 17 '24 16:03 ejgallego

To be more concrete, this PR does two things:

1. add infrastructure for handling submodules transparently

Unfortunately, as shown by your own doc, it doesn't really achieve it.

The issue is that the following invariants seem incompatible:

1. Coq's CI tracks branches

2. The collection of branches that are tracked by Coq's CI always build together / consistently

3. Overlays are made to the branches tracked by Coq's CI

4. Projects have a primary branch where development is done

5. Project developers do not have to manually manage synchronization between the primary development branch and the branch tracked by Coq's CI

6. There is no risk of merge conflicts that have to be manually resolved between the primary development branch and the branch tracked by Coq's CI

Sure but 5. and 6. are in now way requirements of Coq CI. Those are requirements you are adding yourself. Their net effect is to transfer some loadwork from you to us, which seems to be the core of the issue. (BTW, I'm not even sure that 5. is that true, you can probably get it done automatically most of the time)

proux01 avatar Mar 18 '24 12:03 proux01

To be more concrete, this PR does two things:

1. add infrastructure for handling submodules transparently

Unfortunately, as shown by your own doc, it doesn't really achieve it.

Are you seriously complaining about the fact the determining what branch to make the pull request against can't be read off the overlays file? And using this complaint to block this PR in a roundabout way rather than simply requesting a trivial fix such as adding a comment to any line using a submodule indicating the branch to make the PR against?

JasonGross avatar Mar 18 '24 14:03 JasonGross

Sure but 5. and 6. are in now way requirements of Coq CI. Those are requirements you are adding yourself. Their net effect is to transfer some loadwork from you to us, which seems to be the core of the issue

These are reasonable requirements to have of our CI. Are you being honest about the core issue here, or is the core of your objection that the project this issue is about is part of the bedrock2/fiat-crypto ecosystem? I think that 30 lines of straightforward code in a part of the system that is independent of Coq itself is a very reasonable cost for the Coq repository to bear to avoid hundreds of lines of code and the ongoing cost of hours of developer time in users of Coq. It seems to me that Coq frequently adds much more complicated features for much smaller gains in ease of use and I don't generally see you or anybody else objecting to such additions. I feel frustrated by what seems to me to be biased or deceptively- / incompletely- presented objections.

JasonGross avatar Mar 18 '24 18:03 JasonGross

I think that 30 lines of straightforward code in a part of the system that is independent of Coq itself is a very reasonable cost for the Coq repository to bear to avoid hundreds of lines of code and the ongoing cost of hours of developer time in users of Coq.

I fully agree @JasonGross , and indeed let me be clear that your work on this is very much appreciated and I'm sure we will understand and address all the challenges we have in managing CI deps and go ahead with this proposal.

ejgallego avatar Mar 18 '24 23:03 ejgallego

Having a bit more shell doesn't worry me; it is true that adding more shell code is a problem for Coq developers because it is hard to maintain; maybe we could go ahead and start using OCaml to manage this kind of stuff? I dunno.

ejgallego avatar Mar 18 '24 23:03 ejgallego

These are reasonable requirements to have of our CI.

Maybe, I'd just like to understand why. Fiat-crypto is pretty much the only thing in CI demanding such things. If it were by far the most elaborate thing in CI that might be understandable, but there are plenty of other elaborate developments in our CI and they all seem to live a happy life without such demands. So without clear detailled explanations, it's hard not to wonder whether you are not just making some potential development process issue on your side drip on us. You might be right that hacking Coq CI rather than changing anything on your dev process might overall be the best solution. But this must come with clear, thourough, explanations of that dev process, and what makes fiat-crypto uniquely require that, so that it's better understood and accepted.

proux01 avatar Mar 19 '24 15:03 proux01

Anyone have any ideas what's up with

coqc -R src QuickChick -I plugin QuickChickInterface.v
Error: Can't find file ./QuickChickInterface.v

?

JasonGross avatar Mar 21 '24 06:03 JasonGross

cc @lemonidas, @Lysxia, @liyishuai, any idea why QuickChick might be failing with

coqc -R src QuickChick -I plugin QuickChickInterface.v
Error: Can't find file ./QuickChickInterface.v

? Was there a recent change to master? Or would removing all files and folders named .git somehow result in this?

JasonGross avatar Mar 21 '24 06:03 JasonGross

Oh yes, I recently changed the build script in QuickChick, and forgot to update it on here.

To be fixed by #18826

Lysxia avatar Mar 21 '24 08:03 Lysxia

This doesn't work locally, I made a bedrock2 overlay, pushed, rm -rf _build_ci, make ci-fiat_crypto DOWNLOAD_ONLY=1

It looks like the overlay was checked out but then the git_download for rupicola reset it.

https://github.com/SkySkimmer/coq/tree/testbr https://github.com/SkySkimmer/bedrock2/tree/testbr

Thanks, this is fixed now. The problem was that I declared transitive submodules as subprojects of fiat_crypto, rather than as subprojects of their direct parents.


I've also removed the fancy .git logic and am instead sharing parent project folders only when the projects are checked out not on the CI. The CI setup is deeply relying on sibling jobs having disjoint directory structures so that .v timestamps are not scribbled over, and I don't see any way to combine this with downloading repositories in jobs of siblings.

JasonGross avatar Mar 21 '24 20:03 JasonGross

CI passes again

JasonGross avatar Mar 23 '24 02:03 JasonGross

Maybe squash a bit, there's several "revert" commits in there

SkySkimmer avatar Mar 25 '24 14:03 SkySkimmer