PHPCSExtra icon indicating copy to clipboard operation
PHPCSExtra copied to clipboard

NonInclusiveLanguage sniff

Open jrfnl opened this issue 4 years ago • 49 comments

A sniff to examine code and comments for the use of non-inclusive language and throw a warning when found.

Specifically, the sniff should look for sexist, racist, ablist or ethnocentric language, which can contribute to a hostile work environment.

Initial word list

Search for Alternatives to suggest Notes
whitelist, blacklist allowlist/safelist/acceptlist, denylist/blocklist/rejectlist
master, slave primary/main, secondary/replica
he, she, him, her, his, himself, herself they, them, their, themself may need to limit this search to comments
crazy peculiar, baffling
dummy placeholder

Input requested and very welcome !!!

Particularly on:

  • additional words/terms to look for;
  • what should be the preferred alternatives.

What to examine:

Search for these in:

  • Comments and docblocks.
  • Variable names
  • Constant names
  • Namespace names
  • Class names
  • Function names

For constructs, report on these only when the construct is declared, not when used, as usage cannot be changed until the declaration has been changed.

Additional notes:

  • The sniff should be aware of variants of words, i.e. $white_list, whitelisting etc.
  • The regexes used should be careful not to match too much, i.e. sheer should not match she, while master should not match mastering.

External references:

  • https://tools.ietf.org/id/draft-knodel-terminology-00.html
  • https://developers.google.com/style/inclusive-documentation

jrfnl avatar Jun 09 '20 00:06 jrfnl

This is a great initiative. Thank you for taking it up! I've found this list of disability terms with negative connotations a helpful resource.

joemcgill avatar Jun 09 '20 02:06 joemcgill

@joemcgill Thanks, though the credit should also go to @jdevalk.

Thanks for the link. I've had a look through the list, but there are only a few words there which I can imagine people would ever use in a code-context, but maybe I'm wrong ? Please tell me if I am !

The only ones which sprung out at me from that list (other than those already listed above) were:

  • "invalid" - the noun is the offensive word though, while in code, this is usually used as an adjective and I would not be able to distinguish between the two.
  • "disabled" - similar situation.
  • "dim" - similar situation, think "dimming the screen" which is quite different from calling a person "dim".

And possibly

  • "blind" - as in "double-blind testing"

Specific words to search for with suggestions for alternatives are most helpful to get this off the ground.

@vavroom would you care to comment ?

jrfnl avatar Jun 09 '20 02:06 jrfnl

Very happy y'all are looking at this kind of thing :)

On the term disabled, I wouldn't be too worried. While for a very long time there's been a push to use "person with a disability", instead of "disabled", there's also been a massive push for using just disabled, by disabled folks. If you look on twitter for the #SayTheWord hashtag, you'll get a feel for it.

Also, the idea of ableist language is using a medical/disability related word in a negative context. A disabled button is pushing that envelope a bit. I'd not be too worried about it. Then again, I would hesitate to use disabled buttons but that's a story for another day :D

@jrfnl points out correctly that dimming the screen is very different from calling someone dim. Again, I wouldn't worry about it.

I'd be curious to know what blind folks think of "double-blind testing". I personally don't view it as objectionable, but then I'm not the target market of that kind of potentially ableist language.

vavroom avatar Jun 09 '20 03:06 vavroom

@vavroom Thanks for taking the time to give feedback. Much appreciated.

jrfnl avatar Jun 09 '20 03:06 jrfnl

This is a great initiative!

I was also wondering about the terms disabled/enabled.

It feels unnecessary when there are terms like inactive/off/deactivated/restricted that do the job just as well... But I'm not disabled, so I don't think I can speak with any authority. Thanks for your input @vavroom

maccath avatar Jun 09 '20 09:06 maccath

I think there should be room to add words that don't have suggested replacements, brazenly outright innapropriate words, such as the N word, or other derogatory terms, such as calling people with downs syndrome the M word, or the P word.

tomjn avatar Jun 09 '20 13:06 tomjn

Really glad to see this. Thank you!

ChrisWiegman avatar Jun 09 '20 14:06 ChrisWiegman

This is a good idea!

  • disabled is an HTML attribute, CSS selector, and property for elements in the DOM
  • invalid is a CSS selector

In code there is no alternative to that string, much like when spellcheck trips over referer. Even limiting the scope of the sniff for these terms to comments will probably cause a tiring number of false positives.

benlk avatar Jun 09 '20 15:06 benlk

@tomjn Good idea and those words which really shouldn't be used, should probably be an error. I'd be very surprised to ever come across those in code in the first place, but you're right: may as well check for them.

Is it ok if I approach you privately to verify that I interpret the letters you mention correctly ? Or ping you to review the sniff to make sure I have added the right ones ?

jrfnl avatar Jun 09 '20 16:06 jrfnl

Sure, but it's hardly an exhaustive list, and the P/M words might be more used in the UK than internationally. Happy to review

tomjn avatar Jun 09 '20 16:06 tomjn

If the sniffer is going to sniff for a list of naughty words, https://github.com/LDNOOBW/List-of-Dirty-Naughty-Obscene-and-Otherwise-Bad-Words aims to have comprehensive lists.

benlk avatar Jun 09 '20 21:06 benlk

I think there's a difference between 'naughty' words and exclusionary language. For example, I can see a bunch of anatomical and sex related words on those lists which aren't necessarily used in a demeaning and derogatory way; it's less clear cut - and could end up being exclusionary in and of itself.

maccath avatar Jun 10 '20 09:06 maccath

Honestly we should probably make that sniff a separate issue and just go for the main goal here.

jdevalk avatar Jun 10 '20 14:06 jdevalk

I will add my honest opinion that will get a lot of downvotes i guess - people and their behaviors are racist, ethnocentric, abilist etc, not a words themselves used in totally other context.

But yea, if it won't be enabled by default using this package or included in core ever then why not? Someone wants to use then feel free i guess. Otherwise idea is pretty cool because i understand that whitelist/blacklist is not really great naming, allowlist/denylist is much more self explanatory - but creating it as NonInclusiveLanguage and due to racist is just wrong - because those words are not racist themselves, people using them in wrong context are.

You could combine word black with many other nouns - which if said in wrong context can be racist and offended as well, not only blacklist. If we are really going this way we should ban whole black word with combination of anything else, just to make sure that's its bulletproof for future.

At this moment i am offended by this issue and description of it because of this:

Specifically, the sniff should look for sexist, racist, ablist or ethnocentric language, which can contribute to a hostile work environment.

So you are telling that if i currently use blacklist/whitelist i am racist and you suggest that i can have hostile work environment? Idea for this sniff is great, but explanation why it's needed is wrong. This description should be changed honestly.

Jurigag avatar Jun 10 '20 15:06 Jurigag

@Jurigag The thing with racist words, or ableist words, is that it's about the people who are on the receiving end of those words. For example, I run https://ableist.is, a site to help make people aware of their own ableist language. I sometimes point people to that site. I am regularly told things like "I didn't mean crazy in a bad way, I'm not ableist". And it doesn't matter at all what they meant. What matters is that there are a lot of people for whom that word evokes really bad stuff.

The fact that you are offended that the language used in projects could create hostile work environments indicates that you are not likely part of one of the groups that are routinely discriminated against.

The words you use, the actions you take, do not mean that you are racist (or sexist, or ableist). But it's not about you. It's about people that these words hurt. With all due respect, your intentions mean very little. I did not intend to drop a glass but I did and my wife stepped on broken glass bare foot and got hurt. My intentions there mean very little. End result is my wife got hurt. It's a similar thing with racist words, or ableist words, or other words.

And when words like that are used in projects, people may feel, consciously or not, that it creates a hostile environment. Everytime I hear people use words like "cripple", "lame", "crazy", it feels like yet another micro-aggression. I deal with these things several times a day, every day. Each instance isn't particularly bad. Just like one paper cut isn't particularly bad. But if you add them up at the end of the day, the week, the month, the year, it takes its toll.

Check your privilege.

vavroom avatar Jun 10 '20 16:06 vavroom

How blacklist is hurting anyone? The origins of this word are not related to skin colors/race, we as people recently added this racist implications to it.

Then why not ban whole black word? We can figure out many combinations with other nouns which can make black people feel offended.

Jurigag avatar Jun 10 '20 16:06 Jurigag

At this moment i am offended by this issue and description of it

Nobody said you were racist/sexist/ableist; we said the language is.

You've been informed and you have a choice.

Make of that what you will.

maccath avatar Jun 10 '20 16:06 maccath

At this moment i am offended by this issue and description of it

Nobody said you were racist/sexist/ableist; we said the language is.

You've been informed and you have a choice.

Make of that what you will.

Yea you said that language is, but also my work environment can be hostile due to those words and that's why i feel offended.

Jurigag avatar Jun 10 '20 16:06 Jurigag

my work environment can be hostile due to those words and that's why i feel offended.

So don't use them?

maccath avatar Jun 10 '20 16:06 maccath

This issue is concerned with the implementation and details, it isn’t the venue to air personal political opinions, let’s keep the issue focused, constructive, and move forward.

tomjn avatar Jun 10 '20 16:06 tomjn

my work environment can be hostile due to those words and that's why i feel offended.

So don't use them?

Why? Those words are not racist for me in the context i use and i will keep using them. Words are not racist, people using them and their behavior to offend other people is.

This issue is concerned with the implementation and details, it isn’t the venue to air personal political opinions, let’s keep the issue focused, constructive, and move forward.

I agree, then change description about things like hostile work environment or that those words without any context are racist. First post here has already air personal political opinions, that's why i am concern about this, i agree about the idea, but i feel this is yet again some kind of attack to other people like hey, you are racist or you may have hostile work environment if you use those words currently

Jurigag avatar Jun 10 '20 16:06 Jurigag

Why? Those words are not racist for me in the context i use and i will keep using them. Words are not racist, people using them and their behavior to offend other people is.

I can only repeat what I said earlier. It's not about you. It's not about your intentions. It's about how people can react to these words.

Using racist or ableist words don't necessarily make you racist or ableist. You may not intend to create a hostile work environment. Nobody is saying you are racist. But. It's not about you.

Check your privilege.

vavroom avatar Jun 10 '20 16:06 vavroom

Why? Those words are not racist for me in the context i use and i will keep using them. Words are not racist, people using them and their behavior to offend other people is.

I can only repeat what I said earlier. It's not about you. It's not about your intentions. It's about how people can react to these words.

Using racist or ableist words don't necessarily make you racist or ableist. You may not intend to create a hostile work environment. Nobody is saying you are racist. But. It's not about you.

Check your privilege.

And i just disagree with this, because this way we will just go to ban whole black word, simple as that. And that's what i also propose if we want to eliminate and racist implications in our code.

This issue and feature would be great - but without racist implications and suggesting that someone has hostile work environment because they use them. There are many other to explain why we should not use blacklist/whitelist in programming, like they are not self explenatory - allowlist/denylist are much more.

Jurigag avatar Jun 10 '20 16:06 Jurigag

Check. Your. Privilege.

'nuff said.

vavroom avatar Jun 10 '20 16:06 vavroom

And i checked, currently there is freedom of speech, and i can use any words i want. And anyone has privilege to it. Ideas like this are trying to remove some words from use/vocabulary and to reduce freedom of speech, because someone suggests that they have racist implications. You still didn't answer why not ban black word.

You are currently saying that no matter what words i use and what i mean - if someone of other color of skin feels offended by it - i am racist. This logic is just wrong.

Jurigag avatar Jun 10 '20 16:06 Jurigag

@Jurigag I'm going to ask you kindly to remove yourself from this discussion.

  1. Like @tomjn said, your comments are not adding anything relevant to the issue at hand and can be interpreted as hostile and destructive to the discussion.
  2. Words like blacklist and whitelist are coming from a racist history. They are metaphors where "white" was associated with "good" and "black" with "bad". The fact that you don't intend them to be perceived as racist, doesn't mean they are not. Please do a simple internet search and educate yourself before commenting on these kind of issues again. P.S.: and that is something completely different from using the word "black" purely as a colour, which is the literal meaning and if used as such, not a problem.
  3. Even if you don't see it, because frankly that's irrelevant, non-inclusive language is part of the problem and causes micro-aggression on a daily basis, as @vavroom explained far more eloquently.
  4. As has been said before: Check Your Privilege. You say "there is freedom of speech", well that may be the case in your country. You can disagree with this issue, again, it is a privilege that you have the freedom to do so. Please do a simple internet search on privilege and educate yourself.
  5. Nobody is forcing you to use this sniff once it is created.

Please regard this as a formal warning.

jrfnl avatar Jun 10 '20 16:06 jrfnl

Is this just for American English, or will there be region-specific sniffs for other countries' dialects of English?

Is there a significant non-English-speaking PHP community that would justify creating a set of sniffs for non-English languages?

The reason I ask these questions is because separating sniffs by region or language may be easiest to implement from the beginning, rather than adding afterwards once people have integrated the first-contributed sniff into their workflow.

benlk avatar Jun 10 '20 22:06 benlk

A downside of region-based or language-based sniffs is that it would result in code duplication across sniffs where different cultures share some noninclusive words or phrasings. To reduce code duplication, would it instead make sense to have separate sniffs for each separate sort of noninclusive language, allowing sniff-runners to choose which noninclusive language sniffs apply to their situation?

As an example, having a sniff for disabled might cause problems for codebases that deal with <input> elements, whereas a codebase that doesn't deal with <input>s might prefer to include that sniff.

benlk avatar Jun 10 '20 22:06 benlk

@benlk Thanks, that's useful input and actually something I have been thinking about, though I haven't taken a decision yet.

My current thoughts are along the following lines:

  • Start with the NonInclusiveLanguage sniff and set it up to allow for multiple languages, though initially it will only contain English. As this particular sniff will use regexes, for additional languages I really would need trusted input from people very familiar with those languages to make sure I get those right.
  • By default all languages would be checked, but the sniff would have a configuration option (public property which can be set from the ruleset) to allow for making a selection of which languages apply to a codebase.
  • For edge-cases, people can use the default PHPCS inline ignore comments, i.e. // phpcs:ignore.
  • Once that sniff is up and running, start collecting input for other related sniffs, like a NaughtyWordsSniff based on the link you provided earlier and for instance a sniff which checks only documentation for certain language/phrases which throw up fences for less experienced people, such as mentioned in this tweet: https://twitter.com/derickr/status/1270510702143430663

Code duplication won't be much of an issue as that can be prevented by using an abstract sniff and/or traits for the shared code. It's one of the reasons this sniff library is build on top of PHPCSUtils which offers a lot of that kind of tooling to make my life easier ;-)

jrfnl avatar Jun 10 '20 23:06 jrfnl

You've reminded me of the ignore comments, and while I agree that those work in some situations, I'm not sure that they're the right option for inclusiveness sniffs. Having one monolithic NonInclusiveLanguage sniff implies that there is One True Way™ to do Inclusiveness™.

Including by default a sniff for gendered language would be anti-useful to an organization whose practice of inclusiveness involves gendering people by their desired gender. (Anecdote: As they/them is increasingly used for a third gender role in English-language discourse, the more people I see who reject its indiscriminate application to everyone as a form of mass misgendering, each instead desiring for themself he/him or she/her.) If such an explicitly-gendered organization wanted to add inclusiveness sniffs, requiring them to add comments around all their gender-someone-correctly code would be an obstacle to incorporating all of the other inclusiveness sniffs.

I've already said my piece on cases where a sniff for disabled might or might not be wanted, but the barrier for adopting inclusiveness sniffs is higher for an organization that would need to sprinkle their codebase with comments in order to adopt a monolithic NonInclusiveLanguage sniff.

Separating the inclusiveness sniffs into separate sniffs cuts the Gordian Knot of competing access needs by allowing each sniffer to use the sniffs that suit their community's needs, without requiring defensive commenting against the sniffs that satisfy other communities' needs.

There's precedent for splitting out sniffs: this repo has contradicting sniffs for the PHP short list syntax; it doesn't package a monolithic ListSyntax sniff.

benlk avatar Jun 10 '20 23:06 benlk