arrow
                                
                                
                                
                                    arrow copied to clipboard
                            
                            
                            
                        Umbrella issue: Switching from Jira to GitHub Issues
See https://lists.apache.org/thread/nkzbg0481k0dt0l2wq9b2k60kpg5hk62
- [ ] Issue migration
- [ ] Decide how to import content (https://github.com/apache/arrow/issues/14513, ...)
 - [ ] User mappings (https://github.com/apache/arrow/issues/14510)
 - [ ] Labels etc. (see conventions below)
 - [ ] Issue links Jira <--> GitHub (https://github.com/apache/arrow/issues/14511, https://github.com/apache/arrow/issues/14520, ...)
 - [ ] Decide if only migrating open issues
 
 - [ ] Establish conventions
- [ ] Component
 - [ ] Fix Version
 - [ ] Affects Version
 - [ ] Type (https://github.com/apache/arrow/issues/14507)
 
 - [ ] Code updates
- [ ] Website/docs
 - [ ] Release/changelog scripts
 - [ ] Issue templates (https://github.com/apache/arrow/issues/14512)
 - [ ] TODO comments in code
 
 
Decide if only migrating open issues
I think importing all issues regardless of status makes sense as there is no cost to us ( excl. migration script run time) and it has the benefits of keeping all previous discussion and knowledge in one place e.g. for new (and old but they might know to also check jira) contributors to look through prior to opening an issue or for the directional links between issues.
Should we add to the conventions something around severity/priority? I am thinking mainly around how to manage and mark blockers for the Releases. Maybe with a label marking blockers is enough and we don't need more granularity
I think blocker should be a label outside of any other categorization schema. Having additionally at least a distinction between bug and enhancement makes sense imho. I don't think anything more granular than that is needed. (+components ofc)
I have searched through INFRA jira to see how other projects handled the transition to issues and have found some relevant tickets:
- https://issues.apache.org/jira/browse/INFRA-23322
 - https://issues.apache.org/jira/browse/INFRA-23563
 - https://issues.apache.org/jira/browse/INFRA-23386
 
Take aways:
- It seems to be possible for the PMC to switch the jira project to read-only, later on it is also possible to hide it from the web (logged in users can still find it though).
 - INFRA-23563 has a detailed description of their migration including the use of a service account to import the tickets
 - BEAM developed a migration tool https://github.com/damccorm/jira-to-issues cc @toddfarmer
 
Decide if only migrating open issues
I think importing all issues regardless of status makes sense as there is no cost to us ( excl. migration script run time) and it has the benefits of keeping all previous discussion and knowledge in one place e.g. for new (and old but they might know to also check jira) contributors to look through prior to opening an issue or for the directional links between issues.
I opened https://github.com/apache/arrow/issues/14546 so we can discuss this further if anyone wants.
- 
Labels: I think we should review and clean up existing GH labels, which are a bit ad hoc. Any "official" labels should be namespaced so that they can denote multiple categorization schemes without being too much of a mess (for example "Component: C++" rather than simply "C++")
 - 
Component: I would suggest to prefix existing component names with "Component: " and turn them into GH labels
 - 
Fix Version: can apparently be turned into a GH milestone
 - 
Affects Version: this one I find less useful in practice (and it's not consistently filled in), we might simply drop it
 - 
Issue Links: we should at least find a way to migrate the existing ones, as they contain valuable information. One possibility is to append them at the end of the issue description.
 
Things like Affected Version could probably be part of the bug report template instead of being a label. They are helpful information when reproducing issues and thus would be good to have them included in the report, but they have less value in terms of filtering and thus are not very helpful as labels.
Absolutely 👍 on namespacing labels, it's very important to be able to get all possible values of a label in autocomplete.
Some more questions to discuss:
- What should be the new title format? Should we explicitly add the issue|pr number?
Example from Beam:

 - Do we want to keep enforcing "no PR without a linked issue"
 
By way of a quick status update, this sample issue represents the current state of the import mapping work. The migrated content now includes a reference to this gist, a very early draft of migration reference information for the community. Feel free to contribute PRs against that as needed.
One big difference between jira and github is that normal users can only edit the issue body and can not add any meta info unless they are a committer (e.g. assigning it, adding labels...). Some of this can be initially set via issue templates but we might need to think about ways to make issue management available to more people.
We can add up to 20 triage users thatt can modify issues but that is pretty restricted.
Another option might be to use bot commands that allow users to update the issues they created with an additional allow list for contributors with more permissions to extend the 20 traige user slots.
- Issue type: it would be nice if we could keep a distinction between user-visible improvements ("Improvement" or "Feature request") and internal improvements such as refactors ("Task").
 
One big difference between jira and github is that normal users can only edit the issue body and can not add any meta info unless they are a committer (e.g. assigning it, adding labels...). .. Another option might be to use bot commands that allow users to update the issues they created with an additional allow list for contributors with more permissions to extend the 20 traige user slots.
I would personally wait with starting using bots for this until we notice it might be a problem. On the short term, we can give triage rights to those people that do triage of issues and are not yet committers (at the moment I don't think it would be more than 20?)
A general thought about the migration and our usage of issues going forward:
I think it is important that we copy over relevant information for the existing issues in an appropriate way (otherwise we could just start fresh...) but we also want to avoid falling into the trap that is emulating JIRA 1:1 with issues. This is a new system with different capabilities and ways to do things.
Finding ever more intricate ways to emulate JIRA will only create unnecessary busy work and make it harder for new contributors to understand the system and why/how it diverges from "normal" GitHub issues.
#14648 tracks a problem encountered during a dry run where GitHub rejects importing certain issues with very large comments (possibly also very large descriptions).
@toddfarmer going through a few issues from your dry run, some things I noted (some might be known / on purpose):
- Links to pull requests don't yet seem to be included? For example https://github.com/toddfarmer/import_dry_run_3/issues/17912 (https://issues.apache.org/jira/browse/ARROW-18225) has an open PR to address it, but I don't see that mentioned in the issue.
That's of course only for the small subset of open JIRAs that have an open PR (for closed ones, the PR is typically linked in the last comment closing the issue) - Milestone field is not yet populated?
 - In some cases, it seems that links are not correctly converted by jira2markdown. I didn't look in detail what might be different here, but see eg https://github.com/toddfarmer/import_dry_run_3/issues/18156
 - Another jira2markdown one: it seems the converter cannot handle JIRA markup like 
{{{}code{}}}. This is not standard usage (one should do{{code}}, I suppose you might get this other way with clicking some of buttons in the interface?), but I saw it in a few places (eg thevalues[i]in https://github.com/toddfarmer/import_dry_run_3/issues/18029). Not too important I think, but just noting 
The subtasks seem to work nicely!
@jorisvandenbossche:
Links to pull requests don't yet seem to be included?
This is correct. I've opened #14710 to address this. Please comment there on requirements (e.g., would a comment in the migrated issue suffice? does the PR need to be updated, and if so, how?)
Milestone field is not yet populated?
Correct - this is pending populating GitHub milestone metadata, which itself has a few open questions requiring guidance from the community.
In some cases, it seems that links are not correctly converted by jira2markdown.
Thanks for noting - it seems related to the spaces in the Jira link syntax:
[GitHub PR 14495 | https://github.com/apache/arrow/pull/14495] adds support ...
I'm not sure how prevalent this is, or how easily it is addressed, but I've opened https://github.com/apache/arrow/issues/14711 to track it.
Another jira2markdown one: it seems the converter cannot handle JIRA markup like {{{}code{}}}
Thanks, I've opened https://github.com/apache/arrow/issues/14712 to track this.
Can we also try to migrate the JIRA labels good-first-issue and good-second-issue? They are useful to mark issues suitable for fledgling contributors.
Can we also try to migrate the JIRA labels
good-first-issueandgood-second-issue? They are useful to mark issues suitable for fledgling contributors.
Good suggestion! I've created https://github.com/apache/arrow/issues/14724 to track that.
Since it seems we also started to use some JIRAs to track work related to the migration / github usage, I updated the top post with a list of those as well
I would like to propose we migrate Jira issues with this script some day in the coming week. I'll post to the ML as well. Test processes take about 8 hours, mostly due to API call throttling.
Test import and some example issues 1, 2, 3.
I think the key remaining questions are:
- Assignee / watcher treatment, see #14510
 - Missing labels, see #14593
 
I would like to propose we migrate Jira issues with this script some day in the coming week. I'll post to the ML as well. Test processes take about 8 hours, mostly due to API call throttling.
Test import and some example issues 1, 2, 3.
I think the key remaining questions are:
- Assignee / watcher treatment, see MIGRATION: Evaluate user mapping #14510
 - Missing labels, see MIGRATION: Retain issue priority? #14593
 
@rok Quick question: will JIRAs opened for the parquet-cpp be migrated as well? e.g. https://issues.apache.org/jira/browse/PARQUET-2225
At the moment we are not migrating the PARQUET issues and the workflows are being updated to support GitHub issues for ARROW and JIRA issues for PARQUET but that is a conversation I would like to start (on the mailing list probably) to discuss how we could manage those simplifying the workflows. Having parquet-cpp on its own repo or fully integrated on ARROW (not a separate JIRA project) are some possible solutions but not sure if the tradeoffs are something we want to deal with.
With the master -> main change complete we can probably close this out too?
With the
master->mainchange complete we can probably close this out too?
Weren't there some follow up tasks to update website and docs to say main, and remove some "master|main" code? Or have those been done too?
Those remaining tasks are tracked here https://github.com/apache/arrow/issues/31142
This issue can be closed. Thanks to everyone for the good work!