rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Scoped gems proposal

Open mullermp opened this issue 2 years ago • 66 comments

rendered proposal

Hello RubyGems team, and all those that come across this.

In this PR, I have included a proposal for a feature called "scoped gems". In short, the proposal is to widen the gem naming specification to include a new character @ to group related gems together under a specific organization reserved suffix. The naming pattern follows gem_name@scope. On the first gem push (new record), if the gem is scoped (follows the pattern), the gem's scope will be validated to have been created by a user from an organization that reserved the scope. A scoped gem can be installed and required as normal gems are today.

For example, consider aws-sdk-s3, the S3 gem for AWS. If this gem were scoped, it could be published as s3@aws-sdk (or more generically, <service>@aws-sdk). This gem can only be created by a user in the AWS organization (which has reserved the aws-sdk scope). A user can install this gem with gem install s3@aws-sdk and require it with require 's3@aws-sdk'.

The main benefits of this feature are that organizations can publish their own groups of gems (i.e. multiple organizations can have a "configuration" gem), and organizations are able to reserve gem names (via @scope suffix, similar to a reserved prefix). A developer can be reasonably sure that any new gem such as new-cool-feature@rails is an official Rails gem, or new-s3-service@aws-sdk is an official AWS SDK client, or even socket@ruby to be an official stdlib gem! This reservation system combats "fake" "similarly named" gems that are branded as official that attempt to steal personal information.

Please leave any feedback and I would be happy to amend the approach/design.

mullermp avatar Apr 05 '22 00:04 mullermp

I feel like namespaces with the syntax @scope/gemname would be less surprising to users given that is the way NPM implements them. What are your feelings for/against that syntax?

(I also added a link to the rendered doc to the top of the PR description for easy access.)

indirect avatar Apr 05 '22 00:04 indirect

The / portion gave issues, especially around gem commands (because it assumes a file path), though I suppose that can be parsed correctly. Aside from that, I actually think we should NOT use the same format as it could be especially confusing when working with both languages (i.e. Rails website).

Edit (5/5) - I had updated the RFC explaining the technical challenges/limitations of using @scope/gemname. I would prefer to use @scope/gemname too but the risk/reward is not desirable in my opinion.

mullermp avatar Apr 05 '22 00:04 mullermp

If we can't get proper organisations, this might be a good first step.

I'd be a tiny bit concerned this works us into a position where supporting proper organisations is harder.

It's definitely concerning when you create a gem like async and then someone else clobbers your namespace by releasing a gem async-thing-you-care-about. Not sure if this proposal solves that problem?

ioquatix avatar Apr 05 '22 00:04 ioquatix

That is valid feedback. I would think that a user's scope can be easily transferred to an organization concept in the future assuming the scoped username shares the future organization name (i.e. s3@aws-sdk is scoped to the aws-sdk user today, and then scoped to the aws-sdk organization tomorrow - it doesn't change how that gem is installed/required). I have an open question related to migration, that makes use of an "alias" feature. If we go that route, then to support organizations, I suppose you could even alias a gem like s3@aws-sdk to s3@new-organization-for-aws and not require the changing of code for developers either.

Though what do you mean by "proper" organizations? I assume you mean some Rails modeling that groups a bunch of owners together under some entity. Since gems can have multiple owners (already an "organization" if you will), I thought this solution fit in nicely. This proposal can certainly have a concept of such an organization entity and use the same gem naming pattern.

Also, regarding the async example, I believe that same issue exists today regardless of gem name. My gem foo can have an Async module, and your gem async could also have the same module. I don't think we can realistically solve for that, if I'm understanding your concern.

mullermp avatar Apr 05 '22 00:04 mullermp

Part of my concern is asserting that "async-foo" is official and "async-foob" is not.

ioquatix avatar Apr 05 '22 00:04 ioquatix

Ah. I want to be clear that async@foob wouldn't be more or less "official" than async@foo in that example. The scope signals to developers a trusted source/organization. Both foo and foob can be valid users with valid gems that solve use cases.

mullermp avatar Apr 05 '22 01:04 mullermp

At first glimpse, I like this idea. It does not break too many things and seems relatively simple in implementation. My only worry is, that it won't be compatible with PURL format: https://github.com/package-url/purl-spec convention where the namespace is defined before the name:

scheme:type/namespace/name@version?qualifiers#subpath

though it could be fixed by the routing itself in the rubygems.

mensfeld avatar Apr 05 '22 07:04 mensfeld

Interesting, I have never heard of PURL. Though looking at npm's example pkg:npm/%40angular/[email protected] it looks like @ is escaped in the package's scope. So I imagine s3@aws-sdk version 1.2.3 could become pkg:gem/s3%[email protected].

mullermp avatar Apr 05 '22 17:04 mullermp

Summon @simi @hsbt, I'd love to hear your feedback on this!

mullermp avatar Apr 07 '22 20:04 mullermp

I was thinking about this for long time. My biggest concern is transition time. I was thinking about some kind of backward compatible naming scheme (at least for fallback). For example if we decide to do namespacing using "rails/activerecord" scheme, we can fallback for older RubyGems/Bundler by using "rails--activerecord" (or similar, double dash is used only in few gems we can yank currently according to DB dump).

simi avatar Apr 07 '22 20:04 simi

Similar to that concern, I have a concern that people might squat transitional names if that's what we decide on...for example, it would be bad if I could create a rails--activerecord gem now anticipating the real gem moving to the new format

Fryguy avatar Apr 07 '22 20:04 Fryguy

I think "anticipated" squatting can be mitigated (simi mentioned yanking).

mullermp avatar Apr 07 '22 20:04 mullermp

@simi Are you suggesting scoped gems ALSO reserve Ruby namespaces? I.e. s3@aws-sdk MUST use an AWS::SDK Ruby namespace?

mullermp avatar Apr 07 '22 20:04 mullermp

Similar to that concern, I have a concern that people might squat transitional names if that's what we decide on...for example, it would be bad if I could create a rails--activerecord gem now anticipating the real gem moving to the new format

We can reject gems with -- at RubyGems.org and add it to RubyGems specification policy as a warning.

@simi Are you suggesting scoped gems ALSO reserve Ruby namespaces? I.e. s3@aws-sdk MUST use an AWS::SDK Ruby namespace?

No, we don't check anything in the code and I think it would be super complex to start doing that.

simi avatar Apr 07 '22 20:04 simi

Hi!

I like this feature proposal, and I think there's one extra benefit that hasn't been mentioned: it makes "soft-forking" easier. Say Rails wants to temporarily fork the "mail" gem, to provide a better experience until mail gem owners can get to addressing some important issues. Right now, the only way is to come up with a new name, and it's not clear how to properly communicate that the fork is only something temporary, and not meant to completely sunset the forked gem. Releasing mail@rails and depending on it temporarily makes this intention more clear I believe.

My main concern is namespace squatting too. I'm not sure I understand the current proposal and how do we prevent it. Can rubygems.org users create "custom scopes" (different from their usernames)? If that's the case, what prevents any random user to create the @rails namespace? If not, then it seems aws-sdk would be already squatted? Maybe there should be a transitional period where new scopes need to be explicitly approved to avoid this?

Regarding old clients, is the scope--name notation meant so that the feature works as is in old clients? I'm not sure we should choose a weird naming scheme just to support old clients. I think Bundler with a Gemfile.lock file would handle this pretty well since it's able to trampoline to the version that created the lockfile.

Personally, my preferred naming scheme is the one suggested by @indirect, although I understand it would require more work due to the ambiguous "/".

I'm not too sure about how to migrate to the new scheme, it seems quite complicated. I guess duplicate pushing would be best, maybe enhancing the clients to ease it, for example, something like gem build s3@aws-sdk --alias aws-sdk-s3 that builds a "duplicated gem" with the proper legacy naming.

deivid-rodriguez avatar Apr 07 '22 21:04 deivid-rodriguez

@deivid-rodriguez Thanks for the feedback!

My main concern is namespace squatting too. I'm not sure I understand the current proposal and how do we prevent it. Can rubygems.org users create "custom scopes" (different from their usernames)? If that's the case, what prevents any random user to create the @rails namespace? If not, then it seems aws-sdk would be already squatted? Maybe there should be a transitional period where new scopes need to be explicitly approved to avoid this?

I reserved aws-sdk user immediately prior to posting this RFC :D. I think rails user is also owned by the Rails team. I made the assumption that your username is your scope. I think we can certainly use an "organization" here although it requires more thought/design. An "organization" can simply be a group of users on Rubygems, who have access to 1 or more "scopes". Alternatively, the organization name can be the scope itself. I think ultimately there may need to be some initial enforcement.. some users will go and squat some names but we can root them out - I think we can reliably assume users aren't building new software on most of those published gems.

Personally, my preferred naming scheme is the one suggested by @indirect, although I understand it would require more work due to the ambiguous "/".

Yeah, I can see reasons to want it. I started with this approach first but it introduced some complications. Specifically, we'd have to handle these cases: No such file or directory @ rb_sysopen - @mullermp/hola-0.0.0.gem where the gemspec's name assumes a path. Even when fixed, when it's installed, it may also create another nested directory in your gem install location, and that may or may not play nicely with existing tooling? I went down a rabbit hole and decided it was more effort than it's worth, but I could be wrong.

I'm not too sure about how to migrate to the new scheme, it seems quite complicated. I guess duplicate pushing would be best, maybe enhancing the clients to ease it, for example, something like gem build s3@aws-sdk --alias aws-sdk-s3 that builds a "duplicated gem" with the proper legacy naming.

I'm ok with duplication as it would certainly be the safest option. In practice, as a gem maintainer with 300+ gems, it's not as feasible and probably causes some customer confusion. I think a one-way one-level alias might make sense, but we'd have to handle gem install locations too, perhaps a symlink between s3@aws-sdk (real source) to aws-sdk-s3 (symlink folder). If it's handled by Rubygems via alias, it prevents a lot of duplicative work by maintainers.

mullermp avatar Apr 07 '22 22:04 mullermp

👋 I'm positive to add this feature. But I'm not sure what the best syntax about gem_name@scope same as @indirect

We can choose:

  • gem_name@scope
  • scope/gem_name
  • @scope/gem_name
  • etc.

Does anyone summaries scoped namespace feature of other package manager?

hsbt avatar Apr 07 '22 23:04 hsbt

I think one of these is the most reasonable:

scope/gem_name
@scope/gem_name

the latter seems to be the format used by npm IIRC, but I think the former can be slightly better (what's the reason for/motivation of @ character?)

ioquatix avatar Apr 08 '22 01:04 ioquatix

I think the motivation for the at-sign is to clearly distinguish the scope (@scope) from the package name (gem_name). It also makes it possible to talk about the scope separately from the package with the same name. For example, npm hosts a webpacker package in the @rails scope, named @rails/webpacker. Without the @, it's impossible to tell if rails means the scope or the top-level package rails.

indirect avatar Apr 08 '22 02:04 indirect

A tricky question about scoped packages: in Node, it is completely fine to have both @rails/mail and mail in a single project. They do not conflict.

In RubyGems, it would be (presumably) impossible to have both @rails/mail and mail in a single project, because they would both define the Mail constant. That means either a lot of weird errors if someone adds both packages, or a lot of extra work in RubyGems and Bundler to prevent that kind of conflict.

Can a dependency on mail be satisfied by @rails/mail? If not, it's probably impossible to have a Gemfile that resolves. If yes, that's even more work that needs to be done inside RubyGems/Bundler.

indirect avatar Apr 08 '22 03:04 indirect

@indirect if I'm understanding the original proposal:

# @rails/mail
module Rails::Mail
# mail
module Mail

Is that correct?

ioquatix avatar Apr 08 '22 04:04 ioquatix

Interesting, I have never heard of PURL. Though looking at npm's example pkg:npm/%40angular/[email protected] it looks like @ is escaped in the package's scope. So I imagine s3@aws-sdk version 1.2.3 could become pkg:gem/s3%[email protected].

yes but my points was, that the notion of "scope" is not part of purl because of namespaces. However I understand the reasoning here and backwards compatibility.

@hsbt

Few registries slowly drift towards purl: https://github.com/package-url/purl-spec

@indirect on top of that, for node you can have two versions of the same package in the same repo.

About the namespaces: would be good to estimate how many gems actually use their correct namespace vs patching other things or using completely different namespaces.

mensfeld avatar Apr 08 '22 09:04 mensfeld

@mullermp I guess you're right, we would need to deal with squatting as just another type of possible abuse, and define clear rules about it.

@indirect I think in the particular case of mail, it would mostly work because most Rails users don't use mail directly, and Rails is also in control of the dependency and how it's used, so Rails would change their dependency on mail to @rails/mail and they could also choose to namespace the Mail in their scoped gem to stay on the safe side and avoid any conflicts with people using the mail gem directly.

Anyways, this small benefit is just something that came to my mind as a potential extra benefit, but it's not even mentioned in the RFC. I think this RFC only proposes a way to allow gem name collision, but does not impose anything on what top level module a given gem should define.

deivid-rodriguez avatar Apr 08 '22 09:04 deivid-rodriguez

Another question is how to resolve dependencies? Should we use full (including namespace) gem identifier everywhere (including Gemfile.lock and gemspec dependencies specification)? I see the problem in compatibility again in here. We need to ensure transition is as smooth as possible.

simi avatar Apr 08 '22 09:04 simi

Yes, I think the new name should be used everywhere, otherwise it's impossible to differentiate differently scoped gems with the same name, no? I don't think there's much that can be done about compatibility, unfortunately, except for going down the route you suggested before: choosing a name scheme that's currently valid, just mostly unused. It would definitely make things smoother, although not so nice. I'm not fully sure which option is best.

To elaborate on what I said before about Bundler being able to "trampoline", the idea is that, say, support for the new naming scheme is added on Bundler 2.4. And someone upgrades to Bundler 2.4.0 to be able to bundle @rails/mail in their Gemfile. Then a Gemfile.lock file will be generated including the new incompatible naming scheme, and the BUNDLED WITH 2.4.0 marker. If someone bundles this Gemfile{.lock} file using Bundler 2.3.0, Bundler 2.3.0 will automatically detect that the application needs Bundler 2.4 and automatically upgrade itself, so things should just work. Unfortunately this feature is only very recent and versions older than 2.3.0 would still fail hard.

deivid-rodriguez avatar Apr 08 '22 09:04 deivid-rodriguez

@deivid-rodriguez my idea was to support both old/new name scheme (at least for some time). At RubyGems.org it could just check the client version (or we can make some additional header to ask for new scheme) and respond with given scheme.

gem 'rails--activerecord'
gem 'rails/activerecord' # PURL compatible if I understand it well (or any new scheme incompatible with old gem naming could be here)

Would work the same for some time. The latter one just will not be compatible with older RubyGems and Bundler.

I can ask about PURL and plans of other packing repo maintainers at next OSSF meeting.

simi avatar Apr 08 '22 10:04 simi

Mmmm... I think we should provide a migration mechanism so that gem authors can opt in to the new naming scheme, while still being compatible with the old naming scheme. So even if the activerecord gem starts using rails/activerecord naming, plain activerecord in Gemfiles should still work (and by work I mean it should also pick up newer releases using the new scheme). Once users are ready (they are using up to date clients) they can start using the new scheme. And once Rails chooses to do so, it can stop providing releases with legacy naming.

I'm not really sure how we can make the above work, but if we could do it, what would be the purpose of rails--activerecord?

deivid-rodriguez avatar Apr 08 '22 12:04 deivid-rodriguez

One thing I'd like to raise is implementation order of this feature vs adding teams/groups. It will be easier to implement teams and then hand out namespaces for those teams than to hand out namespaces per user later on.

Otherwise there will be a landrush for account names, which are largely unsupervised. Teams will need to start accounts to reserve the namespace. Once they use those accounts we lose the ability to assign responsibility for actions to individuals.

jchestershopify avatar Apr 08 '22 18:04 jchestershopify

@indirect if I'm understanding the original proposal:

# @rails/mail
module Rails::Mail
# mail
module Mail

Is that correct?

If that's true, then this would break the "soft forking" mentioned by @deivid-rodriguez (which happens to also be my personal primary motivation for this scoping idea). If we are expecting scoped gems to also change the internal Ruby modules, then I can't, for example, temporarily fork a transitive dependency to fix something and put it in my personal scope.

Fryguy avatar Apr 08 '22 21:04 Fryguy

@Fryguy it is impossible (or at least I can't image how to enforce that) for RubyGems to enforce or limit any kind of module/class structure in the gem codebase. That is not the plan.

simi avatar Apr 08 '22 21:04 simi