team-compass
                                
                                 team-compass copied to clipboard
                                
                                    team-compass copied to clipboard
                            
                            
                            
                        Remove insensitive language from our code / docs
Across some of our repositories, we're using language that can associate positive/negative valence with words that are often associated with race. The most obvious of this being "whitelist" and "blacklist".
I propose that we make an effort to remove this language from our code and documentation, and replace it with language that has a more neutral connotation (e.g., "allowlist" and "excludelist")
I'm not sure exactly where this language does / doesn't exist in our code and docs, but I wanted to open up this issue to see if others agree with finding / replacing these terms.
note: jupyterlab is thinking about this as well, their discussion is here: https://github.com/jupyterlab/jupyterlab/issues/8533
YES 💛💛💛
Another thing we could think about (which is admittedly more tricky) is changing our default branches to something other than master. For example, main or production. https://leigh.net.au/writing/git-init-main/
Another thing we could think about (which is admittedly more tricky) is changing our default branches to something other than
master. For example,mainorproduction. https://leigh.net.au/writing/git-init-main/
Ah! I'm positive to explore this, but it would be nice to have piloted a switch on one project and tried mapping out the challanges before we make it systematic.
I currently strongly favor main over production because i think it makes more general sense, and as a small bonus it starts with "ma" which could help with autocomplete.
I think github has a default branch name in repo settings, i wonder if that can be an github org setting as well?
We could add a commit to existing master branch declaring it deprecated and making main a copy that we continue from, or we could delete it entirely, or hmmm... I think deleting them may make more sense technically and socially if we decide on this. Better to error hard than not i think.
A topic for our next meeting!
I've added this to the agenda for next week ✨
Thanks for bringing that up @sgibson91 ! Just as one datapoint: I've experienced switching the master branch to another name (I think we used main) on other projects - it did add some difficulties (people kept using master even though it wasn't the default on the project, I think because it's so common in Git). I think this could be particularly confusing for newcomers, since almost all "learn git" tutorials etc use "master" as the main branch, but not impossible to design around.
I think either way, we could add a note recognizing these issues and signaling our stance on them in documentation for the projects (maybe team-compass? but maybe that's too internal...)
A couple comments to think about in advance of the meeting:
- Shall we create a style guide before making changes (initially motivated by this, but could be useful anyway)?
- Are other Jupyter projects doing anything? If they are it would be good to follow what they're doing.
https://www.theregister.com/2020/06/08/developers_renew_push_to_get/ has links to changes made by other projects, useful for getting ideas for alternative terms. (Though for the sake of your sanity avoid opening the comments page).
Note - it looks like github might make this much easier to do by default as well: https://www.zdnet.com/article/github-to-replace-master-with-alternative-term-to-avoid-slavery-references/ so that would hopefully reduce some of the pain associated w/ newcomers and a non-standard branch name
cc @blink1073 who might be interested in this fact too
We also hardcode 'master' as default branch in binderhub: https://github.com/jupyterhub/binderhub/blob/9874c43388005775c29a9bcdabc49b8c16b6e93c/binderhub/static/js/index.js#L124.
We should probably have the backend detect 'default branch' rather than hard code it in the frontend
I haven't experienced switching the default on my own repos, but I've contributed to several that do e.g. develop as the default branch and reserve master for production, and it has been pretty seamless. So that should mostly leave existing forks/clones as having something to deal with.
I'd super love it if there were a widely accepted default to move to, but obviously don't want to wait for that. main is nice because it starts with the same two letters, so anyone browsing/autocomplete-prefixing/etc. is likely to find it when looking for 'master'.
I agree with the goals, but if possibly I'd also emphasize uniformity. git is already hard enough to teach (I know by experience), and "what is this project's default branch" is more cognitive overhead. Copying an pasting sample solutions becomes harder. This isn't a problem for anyone reading this issue, but is a barrier to teaching people git.
So, I think it is worth waiting a few weeks to see what the new standard becomes, and following that. Or doing something and adjusting to the new standard again later, if it doesn't come quickly.
And while it's easy to deal with different names manually, I also have some tools that have to deal with branches and PRs automatically, and consistency is very useful there (even though it's not a technical requirement). In particular, when working on git-pr, I learned that origin/HEAD is set only on initial clone, and is the only way I have found to consistently tell what default upstream HEAD is. I will have to adapt stuff to this, which shouldn't be a problem, but I would prefer to avoid having everyone choose their own names!
GitHub is going to switch the default branch name shortly. Would be awesome if we can use the name they pick. I suspect it'll be 'main'.
Edit: We don't actually need to wait for that though. GitHub's a large for-profit organization, we aren't!
(The replies in that thread are heart-breaking 💔 😢 )
cc @henchc - a heads up that we might have to adjust the henchbot. If we can make it so that it automatically finds out the name of the default branch we can do that already now, otherwise we will have to make the switch as we switch over repos.
echoing @minrk , main seems perfectly reasonable to me and I suspect this is what github will use too. (also, 2 less letters \o/) though, if they pick something different I think we should just go with whatever they pick to reduce confusion
one way we could do this is to create an issue with each jupyterhub repository in order to track the status of the branch switch (as well as tracking the use of "blacklist/whitelist" etc). We can also post some docs on Discourse with instructions for how to make the switch so that it's not just one person that has to do it all.
https://github.com/dfm/rename-github-default-branch is a script that will update the default branch of repos. It doesn't yet(?) have the ability to also change the target of all open PRs.
If we have "edit on GitHub" buttons in our docs we also need to update the config in sphinx so that they point to the new main branch.
I wonder if it is better toe have one central checklist of all the various things that might need doing in a repo, followed by a checklist where each repo that has been converted is ticked off or in each repo creating a checklist. It feels like you'd have to make all the changes in "one go" in a repo and that we will learn/find new things that need editing. All that makes me think a central checklist of tasks, with a list of repos would be easier to keep current.
It would also give a sense of which repos are still missing. I think ending up with some repos one way and others the other way would be the worst of both worlds.
@betatim ah yes, one central checklist is what I meant above, but I don't think I explained that well haha
How easy would it be for us to (temporarily) synchronize master and main, such that any open PRs that are merged into master can be trivially merged into main?
such that any open PRs that are merged into master can be trivially merged into main?
My guess would be that it would be easier to create a script that changes the target of all open PRs
Things I have learned (in case anyone is interested):
This shows the default $remote/$branch:
git symbolic-ref --short refs/remotes/origin/HEAD
but note that you have to know the remote name already, luckily that is often origin.
This will set/update it automatically (new to me):
git remote set-head origin --auto
Does anyone know the best way to infer the best-guess default remote? If you are on a branch, it might have a tracking remote set, but you don't have to be on a branch and branches don't have to be tracking anything. I'm not worrying about this for now.
And this is my current implementation of a git m command which will
always take me to the latest default branch (ugly, untested, set-head
has to be run or it has to be set when cloningcloned, also not
guaranteed to work all the time):
        m = "!f() { { brname=$(git symbolic-ref --short refs/remotes/origin/HEAD | cut -d/ -f2-) ; if test -n \"$brname\" ; then git checkout $brname ; else false ; fi ; } || { for brname in main master gh-pages; do git branch --format='%(refname:short)' --list $brname | grep $brname > /dev/null && git checkout $brname && exit ; done ; } ; } ; f"
This still assumes the default remote is "origin" but falls back to guessing if it fails. I will revise this as I use it more.
Oh, and the reason for this set-head stuff is I do git checkout -b origin/HEAD to make a new branch on the latest upstream commit in any repo - again, assuming origin is the upstream.  It can be added as an alias or actually, I use it via git-pr which uses a little bit more logic to infer the upstream remote.  I am also adding a git pr main to handle that too-long m alias above, but it's not there yet.
One should also do git fetch --prune to remove now-deleted remote branches, otherwise an old origin/master will hang around and can be confusing (since they are only updated, not removed by default).
With these, I don't expect any transitions shouldn't impact me much at least - in any of my projects (as long as we don't have too many different names for things...).
I noticed that the github CLI repo changed their default name to "trunk"... I wonder if that's what github will default to as well?
https://github.com/cli/cli/issues/929
Just giving an update on the plan of action that we came up with during this morning's meeting.
Phase 1:
- start work on renaming things like blacklist/whitelist
- do a graceful deprecation
- need to be a bit careful with dropping names because that might allow logins/repos/etc to work which were blocked
- will require some thought regarding API calls/authentication and order of preference there
- @minrk volunteered to create that PR
 
- write/publish short blog post when we deprecate the old names with a timeline of how long they will continue to exist and when they will be removed
Phase 2:
- renaming git branches and associated tooling
- wait a little bit to see what GitHub and other projects are doing so that we end up doing the same thing
Thanks for taking the lead on this topic! We'll do the same in JupyterLab.
Could we try and agree some replacement terms for whitelist/blacklist so we're consistent across our repos?
- description of this PR: allowlist,excludelist
- https://github.com/jupyterhub/jupyterhub/pull/3090: allowed(though I've suggestedallowed_usersas an alternative on that PR),blocked,allowed_groups
- https://github.com/jupyterlab/jupyterlab/issues/8533: discussion over blocklist/excludelist/allowlist/includelist, but it sounds like they're waiting for us to decide
- https://github.com/jupyterhub/jupyter-server-proxy/issues/205: suggests allowlist
I'm okay with allowlist, blocklist if folks prefer a uniform direct-substitute, though I think being more simple and descriptive for each to-be-replaced value, as in your suggestion for allowed_users would be my preference for JupyterHub.
While the terms we are replacing have a jargony generic meaning that can be used in a number of contexts, I might lean toward picking simple descriptive names in each context rather trying to create new, equivalent jargon that work in all the same contexts. What we do with them in JupyterHub (allow or block/deny user login) is not really what JupyterLab does (include/exclude plugins to enable), so I'm not sure there is a benefit to using the same language. For server proxy, I think allowed_hosts is more descriptive and clearer than host_allowlist, for instance. I do think we should be consistent across JupyterHub with allow and block (I've also seen deny in some projects, if anyone has a preference between block/deny).
So my personal choice for JupyterHub would be the pattern with allow/block for allowing/blocking login and using allowed/blocked_things for specific variants instead of things_allowed/blockedlist:
- whitelist -> allowed_users
- blacklist -> blocked_users
- group_whitelist -> allowed_groups
- team_whitelist (oauthenticator) -> allowed_teams
- etc.
But I'll go ahead with a more direct whitelist->allowlist, blacklist->blocklist substitution if folks prefer. I've seen other projects do this, so there's definitely precedence (rails uses denylist, Chrome, Android use blocklist).
What do people think?
I like the use of {blocked|allowed}_OBJECT as a guide.
Also for reference in other projects, if _thing is unambiguous (e.g. juptyerlab extension manager only having one kind of thing it deals with—extensions), I see no benefit to including the thing in the name. But JupyterHub Authenticators have potentially several kinds of things to allow.
My guess would be that it would be easier to create a script that changes the target of all open PRs
I've done this here: https://gist.github.com/sgibson91/44a1c3a6bbf34257dbdbb621a98dab0d
amazing, thanks so much @sgibson91 !!
We've landed the changes in jupyterhub (https://github.com/jupyterhub/jupyterhub/pull/3090) and dockerspawner (https://github.com/jupyterhub/dockerspawner/pull/381). We still need to do oauthenticator and possibly other authenticators, and make releases.