Enable merge queue in github to avoid commit confliction.
Describe the bug
See https://github.com/apache/arrow-datafusion/pull/6875 for details. We can enable merge queue to avoid such kind of conflicts. You can find out details here: https://docs.github.com/en/repositories/configuring-branches-and-merges-in-your-repository/configuring-pull-request-merges/managing-a-merge-queue
To Reproduce
No response
Expected behavior
No response
Additional context
No response
This would be awesome to do -- thank you for the suggestion @liurenjie1024
The repository settings are controlled by https://github.com/apache/arrow-datafusion/blob/main/.asf.yaml
Perhaps someone can research how to change that file to enable the merge queue
Cool, let me have a try.
Oh, it seems that only apache committers can view the infra page: https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features and I can't help much. Anyway, merge queue is a great way to reduce merge conflicts and it would be helpful to enable that.
Here is the page: INFRA-Git-.asf.yamlfeatures-120723-1012-2450.pdf
Seems that merge queue is not supported by apache git infra 😂, let's close this first.
😢
I propose we re-open this since this is clearly a feature that we want, it's just blocked by ASF at the moment.
The docs and the relevant management code are now openly available:
https://github.com/apache/infrastructure-asfyaml
So whoever wants to make this a reality for DataFusion can start there.
Based on ASF Slack, I believe MQ aren't currently supported in .asf.yaml because there's no API support (https://github.com/orgs/community/discussions/50893)
Based on ASF Slack, I believe MQ aren't currently supported in
.asf.yamlbecause there's no API support (github.com/orgs/community/discussions/50893)
Following https://github.com/orgs/community/discussions/50893#discussioncomment-7792552 and https://github.com/orgs/community/discussions/77614#discussioncomment-10598205 , I think this MIGHT be outdated.
We ll need to make CI ready for the change and then open an email thread, will do that
We ll need to make CI ready for the change and then open an email thread, will do that
I think the mailing list thread should basically say "we are discussing enabling the merge queue" and point at this ticket
Mailing List note was posted: https://lists.apache.org/thread/2kxhfgv10vhqt3y63lxwcjtlhxc55c4y ✅
I'd like to test it first with a smaller repo (datafusion-sqlparser-rs) and move to the main one if everything works fine.
@iffyio, since this would impact you the most, being the most active maintainer of sqlparser-rs, would you be willing to try having merge queues enabled for that repo?
@alamb yeah we can enable it on the sqlparser repo!
Great infra team has given us temporary access to merge things bypassing the CI checks
Hence we should be able to tests the queues in sqlparser-rs 🙂: https://github.com/apache/datafusion-sqlparser-rs/pull/2007
For transparency, here's the migration plan (https://issues.apache.org/jira/browse/INFRA-27154?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel)
From the mailing list
This should hopefully help resolve some of the problems we have now:
- sometimes you want to approve a PR, but CI hasn't finished yet, so you have to wait up to 30 minutes before you can hit "merge"
- the "merge" button may be available, but if the target branch was updated in the meantime, CI may then fail
The first should rarely be an issue in a distributed OSS projects. I've seen folks deliberately delaying merge by hours or days to give others a chance to chime in before the PR is merged.
How big of the problem is the second bullet?
Also, do we have any flaky tests? How will flakiness interact with merge queue?
not saying we should try it out. rather, trying to understand how important is the problem vs potential cost.
the "merge" button may be available, but if the target branch was updated in the meantime, CI may then fail
This "may" happen, but TBH I've rarely seen this. On the positive side as a maintainer I can just hit the queue button and walk away without having to worry about semantic/test conflicts (i.e. the ones not detected by GIT) on the target branch. That's also why the merge queue is rather efficient by default because it can test multiple entries of the queue, assuming that the previous entries will pass CI. If that assumption fails, entries will be retested with the failed entry removed.
What I am wondering though: what happens to commits in PR that are added/pushed AFTER the PR was added to the queue? IIRC that removes the PR from the queue (as one might expect, because we don't want people to sneak in changes) but I cannot find any first-party sources for that behavior.
Also, do we have any flaky tests? How will flakiness interact with merge queue?
I'm not aware of any and I think we should be rather strict with them if they come up. Flaky tests just act like normal test failures.
@blaginin has made a PR to SQLParser to test out how it works:
- https://github.com/apache/datafusion-sqlparser-rs/pull/2007
Merge Queue is working in SQLParser 🥳
The plan is to test it for a few days, merge some PRs, and then switch to Datafusion if contributors will be happy with the experience
Marked some steps as required in the main repo and asked ASF to enable merge queues for us here as well 🙂
so exciting!
MQ just got enabled, I believe that should put the required checks in place https://github.com/apache/datafusion/pull/17538
Thank you for pushing this along @blaginin (and debugging the issues that arise)
Thank you! After https://github.com/apache/datafusion/pull/17629 is merged, I'll go through all the PRs and merge the once that got approved and we wanted to merge them but couldn't because of the check
Also, I think in the future we need to create a [private?] sandbox project for experiments within the ASF GitHub org. I created my own, but apparently there are organization-level rules that are impossible to test if you just fork the repo :(
yeah, I agree testing stuff on github is tough
I think we can create our own datafusion repos within the apache org via some self service API -- how about apache/datafusion-sandbox ?
https://infra.apache.org/version-control.html#create
datafusion-sandbox is great! also i guess we should make it private not to confuse potential users
I'm not sure I have permissions to create that, do you?
I'm not sure I have permissions to create that, do you?
Yes, I think the PMC chair needs to request it. I will do so.
I think it would be better if it is public with a clear README
https://github.com/apache/datafusion-sandbox is now available
Thank you! My plan is to fork the main project there, but instead of readme put a big warning that you should never use this project, this is our CI sandbox