project-ideas
project-ideas copied to clipboard
Moving Bugzilla to Github
Description
Moving bugzilla issues to github
What are rough milestones of this project?
- build automation
- run in parallel
- validate everything is good
- shut bugzilla down
How does this project help the D community?
bugzilla is not really state of the art anymore read more about it here https://www.python.org/dev/peps/pep-0581 http://pyfound.blogspot.com/2019/05/mariatta-wijaya-lets-use-github-issues.html
Recommended skills
Point of Contact
References
<NG discussions, GitHub PRs, Bugzilla issues, ...> https://forum.dlang.org/post/[email protected]
build automation
What needs to be built?
the tool that moves everything from bugzilla to github
The tool already exists since quite a while: https://github.com/wilzbach/bugzilla-migration and has even been tested: https://github.com/wilzbach/tools-test/issues. I never got the formal approval for this migration which was the only blocker :/
There's links to bugzilla in the source code, some just reference the issue number as well. Including a bunch of tests are just named after the issue number. Not sure if your script takes that into consideration?
The script was never fully completed as it was a formal deadend. It was just a test that it's more than feasible to migrate and how it would roughly look.
Does github offer a means for backing up the issue database?
Yes: https://developer.github.com/v3/migrations/orgs/#download-an-organization-migration-archive
@braddr has Ok'd this.
Is there a 1:1 mapping between bugzilla issue URLs and the eventual github ones? I ask because the n.g. archives should have the URLs remapped.
Also the URLs in the dmd source code.
I assume we would set up a server that does the redirects.
Does Github has a way to allow anyone to triage issues ? At the moment anyone can get started with cleaning up bugzilla, but that won't be the case with Github. Not a big deal IMO but I didn't see it mentioned.
In any case, very happy to see some progress on this!
GitHub introduced a new permission level "triage" for exactly this problem. I am not sure whether this can be applied to everyone, but considering that only a handful of people actually triage we could setup a very liberal "triage user group" and invite anyone interested. Obviously we would need to modify the dlang-bot a bit to ensure that the auto-merge label has no effect for triage users, but that shouldn't be hard.
More about this: https://help.github.com/en/articles/repository-permission-levels-for-an-organization
And yes, we would obviously save the 1:1 mapping between Bugzilla and GitHub issues and setup a simple redirect server, so that all old Bugzilla URLs will be redirected to their respective GitHub issues.
BTW one big argument against the migration was that the GitHub API is heavily rate-limited and we can't export the issues anymore. With the GraphQL API that's no longer a problem and we can easily export everything with a few paginated queries, e.g.
query {
repository(owner:"wilzbach", name:"tools-test") {
issues(last:100) {
edges {
node {
title
url
author {
login
}
closed
bodyText
createdAt
closedAt
number
comments(first: 100) {
edges {
node {
author {
login
}
bodyText
createdAt
}
}
}
}
}
}
}
}
BTW an alternative that might be worthwhile to consider for the transition period would be a two-way bridge: syncing all "transitioned" GitHub issues with their respective Bugzilla issues, but disallowing creation of new Bugzilla issues.
In other words:
- comment on Bugzilla -> comment on GH
- comment on GH -> comment on Bugzilla
- no sync for bot comments
- should apply to title change + labels too
In other words:
I'd just go the simpler route of making Bugzilla read-only once the data has been moved to github.
FWIW, LLVM is also (well, will be after they finish migrating to GitHub, ~ 3 weeks) moving from their bugzilla to GitHub issues. The rationale is that GitHub is too important an avenue of reports for (new) users. I think they too are, going to make bugzllla read only.
Link to LLVM discussion: http://lists.llvm.org/pipermail/llvm-dev/2019-October/136162.html
Could we start by moving individual projects ? I'm thinking installer, dlang.org right now.
Note that there is a "new" issue template that should also be very useful for us: https://help.github.com/en/github/building-a-strong-community/configuring-issue-templates-for-your-repository
@wilzbach : What are we missing to start on installer ?
What do I need to do?
There's three options:
-
You can make me an owner of the
dlangorganization. This will give me administrative access and allow me to enable issues for each repository at the time I start to work on it, requiring no other interaction from your side. You can do that from here, just search forGeod24, click on the little wheel, change role, owner. -
~You can make me an owner of
installer. I don't have an exact guide for it, since it's harder for me to test (I would need a second Github account). I believe I need to be in a team which has direct access here but not 100% sure.~ EDIT: Tested, doesn't seem to work. -
You can enable the
installer'sissueshere. Just check the "Issues" box and let us know. I will experiment with it, and when done, ask for another repository. Also you will need to invite me to the team that manageinstaller(and any subsequent repository we transition) so I am able to apply labels and the likes. I might also need you to take some actions, e.g. to modify the labels at the repo and/or organization level (if we transition from bugzilla we'll need to add a bunch of labels, e.g.platform, priority, etc...).
IMHO the first approach is by far the simplest / most straightforward, as long as you're comfortable with it. The second is probably the most correct but would require a bit of back and forth in the future. And the last one will be quite painful to me, I fear (I was testing out with my organization to see what I'd need).
In any case, once a component has been transferred, we will need someone with Bugzilla admin access to disable the component. Who should we contact for this ?
I'd just go the simpler route of making Bugzilla read-only once the data has been moved to github.
I've done this for gdc bugzilla before, based off http://toolsmiths.blogspot.com/2008/05/making-bugzilla-read-only.html
- Remove "Open for bug entry" for all products on editproducts.cgi
- Create a "canstilledit" group on editgroups.cgi. Check "Insert new group into all existing products", and only add admins to the group.
- Use "Edit Group Access Controls" on editproducts.cgi to check the "Canedit" boolean for only the new group "canstilledit", so it will become read-only for any users who are not members of the "canstilledit" group.
- Set announcehtml to point people to the new issue tracker on editparams.cgi?section=general. I have:
<div id="message">
Bug creation has been disabled, file new bugs at <a href="https://gcc.gnu.org/bugzilla">gcc.gnu.org/bugzilla</a>
</div>
It comes as an unpleasant surprise to discover this conversation so much time after it began considering my proximity to the subject. The informed participants have neglected to mention a few things:
- We considered such a migration a few years ago.
- I discovered problems with the idea.
- I began working on a project to improve Bugzilla, specifically replace it with Bugzilla Harmony, an official fork of bugzilla.mozilla.org (BMO). In the process I have been upstreaming patches and collaborating with Mozilla on the project. The live version is here: http://dbugs.k3.1azy.net/
- I stopped working on the above because of lack of general interest, and then this discussion happens.
Now the migration begins and one easily avoidable mistake has already been made (using a personal account instead of a machine account).
Why was neither any of the above mentioned or I was involved in this discussion? I can't think of any explanation other than malicious intent. That one grumpy person's opinion is different from ours, so let's just not include them, no matter that they spent weeks researching and working on this same problem. Shame on you, guys.
To be clear. I wholeheartedly agree that the current Bugzilla instance, as it is at issues.dlang.org, is clunky and in dire need of improvement. What I've been suggesting this entire time is to investigate less radical options of improving it first, and see how much the situation improves without massively disruptive undertakings such as moving the entire issues database.
BTW one big argument against the migration was that the GitHub API is heavily rate-limited and we can't export the issues anymore. With the GraphQL API that's no longer a problem and we can easily export everything with a few paginated queries, e.g.
A good test for that theory would be to write a script which downloads all the issues on https://github.com/rust-lang/rust/. GraphQL rate limits are very different from the REST API, so it may not work as well as you expect.
Here are some things which we cannot do with Bugzilla:
- Be on github.com
- On the GitHub repository page, have an "Issues" tab, which goes to the bugtracker (a feature of Gitea, but not GitHub)
- When someone posts a link in a comment to an issue from a pull request, add a link in the issue
Here are some things which we can do with Bugzilla (the new Bugzilla version either already supports this, or can be improved to support this):
- Cross-link new issues and pull requests (dlang-bot does this now)
- Auto-linkify issue numbers in comments and commit messages to go to the issue (needs GitHub Pro)
- Allow users to identify themselves using the GitHub account, not requiring a second account
- Issue templates
- Markdown and syntax highlighting
- Edit posts
Here are some things which we can only do with Bugzilla, and not GitHub issues:
- A bug reporting wizard, with custom logic such as automatic test case reduction or bisection
- Non-boolean metadata (e.g. you cannot sort by severity with GitHub labels, only filter)
- Anyone can easily download all data and use it offline
- Own our data
- Have one issue number per bug
The last point is much more important than it may seem. If we have more than one issue number, the following problems occur:
- You can no longer say "issue ###" to unambiguously refer to an issue.
- Commit messages in existing commits which contain "issue ###" no longer unambiguously refer to an issue. You will need to check the date to see if it's pre- or post-migration.
- GitHub will auto-linkify "issue ###" in old commit messages, but it will link to the GitHub issue, i.e. it will create broken links.
- File names in the DMD test suite will no longer unambiguously refer to an issue.
- Comments in D source code will no longer unambiguously refer to an issue.
Unless this can somehow be avoided, it will cause a huge mess and never-ending confusion and frustration.
So, how to proceed? I was having severe difficulty gauging how important this issue is (if I knew about the discussion / interest then I would have dedicated more time to it), but considering the interest (:+1:s) here I would like to suggest the following:
- Finish the new Bugzilla (I guess I will be dedicating my next D time slices on that, so expect a result in probably a few months)
- Test and deploy
- Re-evaluate the situation in a few years
- If the results are still unsatisfactory, perform a careful migration to GitHub, avoiding as many pitfalls as possible
In my opinion we have more to gain from a polished Bugzilla instance than a messy GitHub migration. Thoughts?
Wow, I had no idea about all these pros and cons. Thank you! Lots to think about.
Now the migration begins and one easily avoidable mistake has already been made (using a personal account instead of a machine account).
That one is on me. Duly noted, and will fix. To be clear, the migration hadn't fully begun. I merely started experimenting with tools (a repository with very low bandwidth), in order to find out pain points.
And it did find some:
- We needed a mapping between Bugzilla emails and Github user account (which I compiled locally, but won't make public for obvious privacy reasons);
- We needed to replicate the categorization of Bugzilla in Github (done via labels), however not all labels apply to every components, some labels were redundant, and some are completely unused;
- Attachments were not handled;
- Formatting in general was pretty poor. Code blocks are not properly highlighted, or commands.
- A bunch of things related to dlang-bot.
- Specific to tools: Almost half of the bugs were OPTLINK-specific.
I wholeheartedly agree that the current Bugzilla instance, as it is at issues.dlang.org, is clunky and in dire need of improvement.
The need for improvements is not the only reason for this migration. It plays a big role, sure, but there is no denying that first-time contributors will find less pain in using Github than a separate website (even if they can login via their Github account). That was mentioned by @thewilsonator here.
Here are some things which we can only do with Bugzilla, and not GitHub issues:
- A bug reporting wizard, with custom logic such as automatic test case reduction or bisection
You can do that on Github now, through Github actions. Additionally, anyone is able to work on the integration, instead of having that right limited to a few, overworked people.
- Non-boolean metadata (e.g. you cannot sort by severity with GitHub labels, only filter)
True, although throwing together a user script shouldn't be hard. And you can sort by priority on Bugzilla, but it's not really efficient. Priority labeling is very inconsistent across people, save for a few boolean ones (e.g. trivial and regression), what is normal, major, critical and blocker is easily confused. The labels put for tools, so far, expose 5 levels of priority (Regression, Blocker, Normal, Low, Trivial). I originally was hoping to merge Low and Trivial but it seems to be used enough to warrant the separation (and to avoid the disturbance).
- Anyone can easily download all data and use it offline
Sounds like an artificial point. If you do that, you either want to work on a bug offline (one can just save the webpage), or script something (like bug reduction tool), which you can easily do via the Github API. In the later case, your scripted tool might be able to pick up D code blocks more easily if we're on Github.
- Own our data
That's a very broad topic. The cost of owning our data is self-maintenance and implementation of features. If we really want to "own our data", and that point trumps all other considerations, we could consider Github Entreprise. But let's be serious, Github is less likely to vanish than any single contributors, and many of us are SPOF (what happens if you go AWOL ? Or Brad ? Or Mike Parker ?). What matters more to us, and Walter has made this pretty clear, is that we have backup for all this data.
- Have one issue number per bug
That is indeed a pain point. I wanted to experiment with a few things with the tools repository. So far, I believe, using Issue XXX vs Issue #XXX can do the trick, although it's not the most user-friendly approach. Note that the import does not import closed / fixed bugs (it wouldn't make sense, since we can't retain the issue number anyway).
Comments in D source code will no longer unambiguously refer to an issue.
This needs to be fixed. DMD transitioned from issue number to links a long time ago, and that needs to be applied to other repositories as well.
File names in the DMD test suite will no longer unambiguously refer to an issue.
Likewise, we should provide a link for every test that refers to an old issue.
GitHub will auto-linkify "issue ###" in old commit messages, but it will link to the GitHub issue, i.e. it will create broken links.
Only when # is used, which is why I suggested the separation. But that problem already exists with pull requests, and I rarely see # being used for that reason.
In my opinion we have more to gain from a polished Bugzilla instance than a messy GitHub migration. Thoughts?
Your post make it sound like we're going to just fire a quickly-written, automated script overnight, on all repository. In practice most of the work so far has been triaging, cleaning up, issues. I haven't fired any request to the Github API, but merely automated the issue body generation and some metadata. What in the issues that have been moved did you find messy ?
The more TL;DR version would be:
- We have to many SPOF at the moment, Bugzilla is one of them;
- We can't organically script any kind of integration with Bugzilla, we can do so with Github;
- It's much more newcomer friendly;
The other concerns, we can easily work around them. You right that the bug ID is the main concern, however, due to our mixing of Github and Bugzilla, using # is currently not so common, so I think the pain you envision is being blown out of proportion.
To give an example of another benefit that switching to Github brings: much better categorization. At the moment filling an issue on Bugzilla just leaves you with a blank box, no template. In Github you can provide issue templates. See the example I put there: https://github.com/dlang/tools/issues/new/choose