community icon indicating copy to clipboard operation
community copied to clipboard

Renaming Bus Factor

Open geekygirldawn opened this issue 3 months ago • 56 comments

I know that renaming what is probably our most widely used metric is going to be painful, but I think it's time to rename Bus Factor to something else.

The number of people I've had express pretty severe dislike of the name Bus Factor is quite high, and I often try to avoid calling it Bus Factor.

I often call it "Lottery Factor" because it's easy to understand. How likely is your project to survive if someone suddenly one the lottery, retired on a beach, and never looked at your project again.

Pony factor is more widely used, because it's been adopted by the Apache Software Foundation, but I find that it's harder for people to understand outside of the ASF. There isn't an easy narrative around it like what I have above for Lottery Factor.

I'd be really curious about the opinions from folks involved in inclusive naming initiatives and whether they've seen a commonly suggested substitute for Bus Factor.

I'm also curious about what the academic folks have seen. Is there a particular term that is more widely used in Academia / research?

cc-ing a few folks that I think would be interested in this discussion: @GeorgLink @germonprez @sgoggins @ElizabethN @klumb @dicortazar

I welcome any Chaotics to jump in with opinions.

geekygirldawn avatar Mar 28 '24 11:03 geekygirldawn

In my experience, academia (at least the part that's aware of open source software) uses bus factor. In academia, most people who don't know it, after having it explained, generally either get it and laugh, or still don't understand it (and don't really understand open source either).

I know I've also heard at least one other term that was more pleasant and still worked, but I can't quite remember it for the minute. (I've also heard truck factor, but I guess that's not really much of an improvement.)

danielskatz avatar Mar 28 '24 12:03 danielskatz

If we're to rename the metric, now is the time before we standardize anything with a standards body.

Bus Factor and Lottery Factor both describe an external event that would impact directly a project member and thus the project. There are a host of other events with the same potential impact: becoming a parent, getting laid off, moving to a new city, having to take care of family members, meeting a significant other, ... - the metric could be named after any of these events and still always require explanation.

Maybe we can choose a name that is directly descriptive of the problem or threat: concentration of knowledge, distribution of effort, ...

GeorgLink avatar Mar 28 '24 13:03 GeorgLink

Maybe we can choose a name that is directly descriptive of the problem or threat: concentration of knowledge, distribution of effort, ...

When I talk about this issue, I generally frame it as a discussion of "Contributor Sustainability", but it's probably only one of a number of things that impact contributor sustainability.

I still think the metric should be named something that's already established within our community and the literature, which might make Pony Factor a better choice. I like Lottery Factor, but it's definitely not as well known.

geekygirldawn avatar Mar 28 '24 13:03 geekygirldawn

There are two things I think don't work well about "Bus Factor"

  1. Not everyone likes or intuitively understands the purpose of "bus" in the phrase without explanation
  2. "Factor" feels like a word that works, but isn't as intuitive as something like "count".

I find skeptical looks and confusion when I'm explaining that a "bus factor" of a project is 3, and that's a bad thing. I feel that if we had some language that better indicated what the bus factor is, intuitively, that would be more persuasive and useful. I know "bus" is what we're moving away from, but... When I think of projects making progress, I usually think of forms of transportation. Trains, planes, boats, cars, etc. They move many people around and need critical pieces to keep them moving.

We could borrow some of these ideas, and swap "factor" for "count" like:

Captain Count
Pilot Count
Engine Count
Turbine Count
Tether Count
Anchor Count
Driver Count
Wing Count (gets a little dark, if you think of it)
Battery Count

I'm also happy to turn away from "bus" like things, maybe options from nature without conflicting with git?

Root Count
Host Count
Monarch Count

That's if we're willing to get creative though. I think if we plan on changing the name to something, Bus Factor is certainly the most well known -- so we should update it to something more intuitive and descriptive. Out of the options I gave, I like Host, Pilot, Captain, and Monarch. Excited to have a discussion about it.

The confusing wordplay also exists for "elephant factor" -- but that should probably be a separate discussion :)

GaryPWhite avatar Mar 28 '24 13:03 GaryPWhite

I would vote against Pony Factor, as it's based on an in-joke that is just confusing to people who aren't in the group (because ASF is full of ponies, or people who think they are ponies)

danielskatz avatar Mar 28 '24 13:03 danielskatz

@justaugustus I'm curious if the Inclusive Naming Initiative have had any discussions about this or related terminology?

geekygirldawn avatar Mar 28 '24 13:03 geekygirldawn

I usually use lottery as well, pony doesn't make a lot of sense to me. On count vs factor, what about something like frequent contributors count, which parallels with 'inactive contributors' and 'new contributors'. Bus factor assumes something about the impact of these specific people leaving which might or might not be true depending on additional context. Just calling it a count of people who undertake a certain level of activity leaves it more neutral and more clearly as just part of the fuller picture.

starsplatter avatar Mar 28 '24 13:03 starsplatter

I usually use lottery factor as well

cdolfi avatar Mar 28 '24 14:03 cdolfi

I like to use names/terms that are easy to read and do not have implied meaning or metaphor. For something like this in other kinds of projects or organizations, it is sometimes called 'key person risk' (or key people/member risk). I'll toss 'key maintainer risk' in here for consideration, since when I read things like 'lottery factor' or 'pony factor' I have to go look up what that means in this context (and it may be even harder for those who don't have English as a first language); 'key maintainer risk' seems closer to describing exactly what is being measured.

PaulaPaul avatar Mar 28 '24 14:03 PaulaPaul

I would love to not use "bus factor" and usually use "lottery factor" instead. And I always explain in a few words what that means, as I'd explain the naming for any of our metrics. In my opinion, nothing is intuitive to everyone. Even something like "event location inclusivity" requires a few words of explanation by what we mean by that.

  • I understand the need for consistency, but if we have the opportunity to make open source more inclusive overall, I think we should do that.
  • I agree with @GeorgLink in that we are focusing on the issue that is causing risk makes a lot of sense. I think it's also about access, not just knowledge. Who has the keys to the castle, so to speak.
  • I like @GaryPWhite's suggestion to use "Count" instead of Factor.
  • I agree with @danielskatz in that the use of "pony" is confusing and gate-keepy (I'd actually never heard that).
  • I agree with @starsplatter in that simply focusing on a count of people who undertake a certain level of activity is neutral
  • I like @PaulaPaul's suggestion of "key person" or "key maintainer" because we're also talking about the folks that have access to all the levels of the project.

What about "Key Maintainer Count" or "Core Maintainer Count"?

ElizabethN avatar Mar 28 '24 15:03 ElizabethN

@ElizabethN I would be concerned with using the term "maintainer" as for many projects that has a very specific meaning. Maybe "Key Contributor Count"

cdolfi avatar Mar 28 '24 15:03 cdolfi

KCC has a nice ring to it, and it's definitely more clear than an analogy.

GaryPWhite avatar Mar 28 '24 15:03 GaryPWhite

Oooh, I like Key Contributor Count.

geekygirldawn avatar Mar 28 '24 15:03 geekygirldawn

For some reason, I thought we had already addressed this one. Thanks for bringing this up @geekygirldawn. The name is definitely problematic and agree pony factor isn't good option either. We could use them as key words though link them to the new name.

I like Key Contributor Count... Or Key Contributor Risk.

klumb avatar Mar 28 '24 16:03 klumb

...or Core Contributor Risk.

We have defined Occasional Contributors ( which was previously problematic as "Drive-by Contributors").

However, we have not defined key or core contributors. Academic literature usually uses core but key may be more descriptive of contribution importance.

klumb avatar Mar 28 '24 16:03 klumb

I like Risk better than Count, as it has the same sense (of urgency/danger) that Bus Factor has. It also feels less like something people would try to game

danielskatz avatar Mar 28 '24 16:03 danielskatz

At risk of sounding like a typical tech exec.... Wouldn't "gaming" this metric be a good thing? More people contributing to oss at a level to constitute bus factor seems likea. good thing...

I'll put in that I think "risk" being a number runs the same risk (ha) as using a word like "factor". Without explanation, "my key contributor risk is 3" is a nonsensical phrase.

GaryPWhite avatar Mar 28 '24 17:03 GaryPWhite

I like Key Contributor Count.

My concern with "key" is that it adds a value judgement. Also following Elizabeth's comment: not everyone needs to have a key to the project to be included.

Returning to the definition of the metric, we're counting the smallest number of contributors that made 50% of all contributions during the analysis time window.

How about: Majority Contributor Count

"Majority" because people know that concept from voting and other contexts.

We could also emphasized that a larger count is good and go with something like:

  • Majority Contributor Spread
  • Majority Contributor Concentration

GeorgLink avatar Mar 28 '24 17:03 GeorgLink

In truth, without explanation, any name we choose is likely to be nonsensical. Some are more descriptive than others though. The metric should describe what we are trying to measure - which is the risk associated with key contributors abandoning a project (i think). I wouldn't get hung up on a number.

klumb avatar Mar 28 '24 17:03 klumb

@klumb I get ya. The measurement is absolutely indicating the risk. I would agree more with "risk" if the metric itself wasn't a count/number. It's descriptive of what we're measuring to name the measurement. If we're renaming it anyways, why be vague? We could keep metaphorical names like "pilot risk" etc. but that hardly solves the #2 problem I mentioned above, where I regularly have to explain what the metric actually is for people to buy into why it's useful. Just my experience, though.

I like @GeorgLink 's observation. Majority Contributor Count is ultra-succinct and descriptive. I didn't even think about how "key" could mean like "having a key". Majority is much more specific.

GaryPWhite avatar Mar 28 '24 18:03 GaryPWhite

Also, I think value judgement is going to necessary for this metric. What the value is the question? Is it related to 'ownership/authorship of a percentage of the codebase?

klumb avatar Mar 28 '24 18:03 klumb

If it is about percentage of codebase, rather than contributor, maybe we need it to be about contribution authorship. For example, Majority Contribution Authorship, Majority Contribution Spread, or Majority Contribution Maintainership? Contribution Maintenance Risk? Majority Contribution Count?

Just throwing some more out there. ;)

klumb avatar Mar 28 '24 18:03 klumb

I really like majority, I think it removes the value judgement of 'key'. This count in the chaoss metric reads to me as just a naive count of how many contributions people make as a percent of the total number of contributions, it says nothing about the value of those contributions in terms of code quantity or quality, which I think argues for keeping the metric as more of a single neutral data point. The metric says it wants to answer "how many contributors can we lose before a project stalls?" but that seems packed with assumptions to me.

starsplatter avatar Mar 28 '24 18:03 starsplatter

Sorry, but I have no idea what majority means in this context. And given that not all open source contributions are captured in a repository, how would it be measured?

danielskatz avatar Mar 28 '24 18:03 danielskatz

The metric says it wants to answer "how many contributors can we lose before a project stalls?" but that seems packed with assumptions to me.

It totally is! That's part of the magic IMO 😄 There's some stake-in-grounding happening here. What kind of assumptions need to get made to actually measure something, ya kno?

And given that not all open source contributions are captured in a repository, how would it be measured?

While it's perfectly possible this isn't a perfectly accurate representation, I believe that the metric and it's implementations are usually disjointed. I believe most of the time, contributions are "counted" here as "commits" or "pull request open/close" or "issue open/close". That's just a function of using the GH API / history to make measurements... More tools could definitely get built to measure more though 😃

Sorry, but I have no idea what majority means in this context.

"Majority" here meaning who is making the majority of contributors. Majority Contributor Count = count of contributors who make the majority of contributions in the project for some time window.

GaryPWhite avatar Mar 28 '24 18:03 GaryPWhite

I like what the bus factor means, in that - a project is one disaster away from the project being completely abandoned or maintained. Some maintainers might keep maintaining if they win the lottery :)

I think the seriousness should be retained because that seriousness is what gets people (leaders, people with influence) to act (majority contributor count IMHO, not so much) _, but agree bus factor is morbid. Propose then something more like 'disaster factor' because it has meaning immediately.

emmairwin avatar Mar 28 '24 18:03 emmairwin

Adoption may be better for Disaster Factor because it is close enough to the previous problematic name. It also signals the risk part. I think that could work.

klumb avatar Mar 28 '24 18:03 klumb

Disaster Factor = The risk associated with a count of contributors, who authored a majority of contributions in the project for some time window, abandoning a project. It is probably a good idea to review the description and objective of this metric as well.

klumb avatar Mar 28 '24 18:03 klumb

This is such an interesting discussion! I understand the concerns with the use of terms like 'key' (that would require definition), and the discussion of what we are really trying to measure or gauge here. There is an element of 'risk' (is this project at risk if any one person decides to stop contributing?), and an element of project 'resilience' and 'sustainability' (could the project survive and thrive without a specific, small, number of engaged contributors?).

At the risk of making this more complicated, is there a rubric that is used to come up with this number, so the name of the metric might be less of a concern (it's explanation would be the rubric)? I like the words 'adoption', 'risk', 'sustainability', and 'resilience' because they are less problematic (for me) than 'bus' or 'disaster' -

PaulaPaul avatar Mar 29 '24 14:03 PaulaPaul

We could use the GitHub poll capability in the Discussions area to create a poll from the names that have been suggested here and solicit votes from the community. Let me know if you'd like me to put that into a Discussion thread (I'm still relatively new to this community - sorry if that's been rehashed or if there is a different norm for this sort of thing!)

PaulaPaul avatar Mar 29 '24 14:03 PaulaPaul