Registry feature wishlist
Here's a laundry list of features I'd like for the Mops registry. My goal is to have a discussion here. Once we've decided which of these we'd like to have I'm happy to contribute/help with implementation efforts.
The first four would be a pre-requisite for implementing a dependency version solver, while the latter are QoL improvements (the reporting feature is the kind of thing you want to have in place before you need it).
- Manifest support for version ranges, maybe a more restricted version of the cargo version requirement syntax I'd recommend not messing with pre-releases. If we really want them, they can be allowed later on.
- Store compressed tarballs instead of individual files, with a single integrity hash per package version
- Pro
- Saves storage through compression, and by only requiring a single integrity hash
- Downloads a dependency in a single request
- Much cheaper to verify for integrity
- Enables easier mirroring
- Con
- Requires decompressing on the registry side to compute the content hash
- Would break the file viewer on the Mops website until we also implement unpacking the tar-balls
- Downloading a full index of package versions (with incremental updates) I'd recommend copying either the cargo registry index or the purescript registry index format and storing them as a Git repository to get incremental updates for free Not sure how easy/hard this would be to do with HTTP outcalls, or if it would require a periodic off-chain runner
- Uploaded packages should only be allowed to reference other published packages, meaning no Git or Path dependencies. Could consider weakening this for dev-dependencies
Miscellaneous missing capabilities:
- Ability to report a package (Copyright, illegal content, etc)
- Yank/Deprecate command
- https://doc.rust-lang.org/cargo/commands/cargo-yank.html
- https://docs.npmjs.com/cli/v11/commands/npm-deprecate
- Unpublish command
- https://docs.npmjs.com/policies/unpublish <- apply similar restrictions
The first four would be a pre-requisite for implementing a dependency version solver
I suppose this related to #285?
1. Manifest support for version ranges
What benefits or use cases for this?
Currently mops resolves deps as if they were defined with caret ^, so 1.2.3 means >=1.2.3 <2.0.0. Also dep version can be hard pinned.
As long as package authors follow semver, this will work, but if not, it will break.
Users might want to always hard-pin dep version for better reliability. But this will lead to grow
2. Store compressed tarballs instead of individual files
I have doubts about benefits and worth of implementation. Is this really a "pre-requisite"?
Mops downloads packages in 12 threads(12 files). And all downloaded packages are cached locally so usually user downloads package version files only once. We will still need to keep old storage type to work with existing clients.
* Ability to report a package (Copyright, illegal content, etc)
I think github issues is a good way to report a package, but I think you are more about banning/blocking a specific package from Mops side?
4. Uploaded packages should only be allowed to reference other published packages, meaning no Git or Path dependencies.
Agree, github deps make the resolver less reliable.
Currently Path deps is not allowed to be published with a package. Git dependency should contain a commit hash(some old packages miss this).
GitHub deps prevent us from resolving packages on the backend.
We can disable github deps publishing now, but we cannot rely on this for some time while there are already published packages with github deps.
3. Downloading a full index of package versions (with incremental updates)
What motivation behind this? Is it like a backup?
6. Yank/Deprecate command
👍
7. Unpublish command
👍
Thank you so much for responding so quickly and comprehensively!
I suppose this related to #285?
Yes, a solver would be a different strategy for resolving dependencies, that tries to find a combination of a single version for every dependency that satisfies the dependency constraints.
- Manifest support for version ranges
What benefits or use cases for this?
Currently mops resolves deps as if they were defined with caret ^, so 1.2.3 means >=1.2.3 <2.0.0. Also dep version can be hard pinned. As long as package authors follow semver, this will work, but if not, it will break.
Users might want to always hard-pin dep version for better reliability. But this will lead to grow
I would definitely keep the current syntax to mean the "caretted" version constaint.
Use-cases are mostly the following two:
- A package author made a minor/patch bump that ends up breaking my build. Time has shown that it's really hard to gauge whether a change is breaking or not and people get it wrong all the time.
- My library works across multiple major versions of a given dependency. At the moment I cannot declare a dependency like
http = ">= 2, < 5"
The hard-pinning strategy kind of works for canister authors, but really isn't an option for library authors.
- Store compressed tarballs instead of individual files
I have doubts about benefits and worth of implementation. Is this really a "pre-requisite"?
Mops downloads packages in 12 threads(12 files). And all downloaded packages are cached locally so usually user downloads package version files only once. We will still need to keep old storage type to work with existing clients.
Just to give a baseline, here's the current size of [email protected] when downloaded by mops uncompressed vs compressed:
872K [email protected]
136K [email protected]
So compression alone would allow us to store 6-7x times as many packages in the same storage canister and also make the download that much faster. Given that you're usually downloading N packages at the same time there's plenty of parallelism there, that doesn't require splitting up a package.
It's a prerequisite in the sense that the index I'm referring to above would only store a single content hash per package version rather than N hashes + file paths.
Ability to report a package (Copyright, illegal content, etc)
I think github issues is a good way to report a package, but I think you are more about banning/blocking a specific package from Mops side?
I think we'd probably want a button/form on the mops.one website, or at the very least an E-Mail address. A lawyer won't bother creating a GH account, but you really want to give them an opportunity to send you an E-Mail before they sue ;)
- Uploaded packages should only be allowed to reference other published packages, meaning no Git or Path dependencies.
Agree, github deps make the resolver less reliable.
Currently Path deps is not allowed to be published with a package. Git dependency should contain a commit hash(some old packages miss this).
GitHub deps prevent us from resolving packages on the backend.
We can disable github deps publishing now, but we cannot rely on this for some time while there are already published packages with github deps.
:+1:
- Downloading a full index of package versions (with incremental updates)
What motivation behind this? Is it like a backup?
The motivation is to enable version solving, as the algorithms used there, rely on fast access to a lot of versions and dependency information. It would also make it easier to maintain a package-set like https://github.com/christoph-dfinity/new-base-package-set without scraping the mops canisters.
It doesn't really function as a backup, as it doesn't hold any of the actual package contents, but it does give some amount of redundancy as the index does hold the integrity hashes for package versions.
- A package author made a minor/patch bump that ends up breaking my build. Time has shown that it's really hard to gauge whether a change is breaking or not and people get it wrong all the time.
It looks like version ranges will actually cause this problem? Because we relaxing the package versions that we accept.
Or we should add stricter checks that version range can only refer to the previous published versions.
For example if http package's latest published version is 2.0.5, you can specify http = ">=1.1.0, <=2.0.5", but not http = ">= 2.0.0, <3.0.0".
I remember some issues with npm when I use ~, ^ and the build could start failing because you removed node_modules and ran npm install, or build works for you, but for someone else who git clone'd repo and npm install'ed would get newer dep versions. Maybe it was before the lockfile was introduced... I would like to avoid these issues.
For canister builders mops guarantees that final resolved deps versions will not change if you don't change deps in mops.toml.
So "A package author made a minor/patch bump that ends up breaking my build" will not happen for canister builders until they update mops.toml
It looks like version ranges will actually cause this problem? Because we relaxing the package versions that we accept.
Sorry I don't understand. What I'm saying is that with the current state 1.4.1 actually means >= 1.4.1, < 2.0.0, and if version 1.6.0 turns out to be incompatible with my library I can't release a patch update that fixes the dependency to say >= 1.4.1, < 1.6.0 at the moment.
For canister builders mops guarantees that final resolved deps versions will not change if you don't change deps in mops.toml
I was focusing on library authors here. But for Canister authors this also matters. The lockfile will prevent their current build from breaking, but starting a new canister, adding a library, or updating their dependencies will all run into the same issue.
Sorry I don't understand. What I'm saying is that with the current state
1.4.1actually means>= 1.4.1, < 2.0.0, and if version1.6.0turns out to be incompatible with my library I can't release a patch update that fixes the dependency to say>= 1.4.1, < 1.6.0at the moment.
I meant what if we restrict ranges beyond the published versions? So newer dep version will not break existing packages.
For example if the latest published version is 1.4.1:
>=1.0.0, <1.2.2 - ok
>=1.0.0, <=1.4.1 - ok
>=1.0.0, <1.5.0 - error
>=1.0.0, <3.0.0 - error
I was focusing on library authors here. But for Canister authors this also matters. The lockfile will prevent their current build from breaking, but starting a new canister, adding a library, or updating their dependencies will all run into the same issue.
Here too, with the above restriction, the build can only break if we change mops.toml.
UPD: Downside is that package author cannot specify >= 1.2.2, < 2.0.0 to avoid frequent dep updates when newer dep version is released
(Sorry for the late response, had a couple busy days)
I don't think that's a workable approach, as any release of a library basically creates a "ripple" where every depending package now also needs a new release just to bump its dependency. This would basically be a "DDOS" on the package ecosystem.
SemVer "works" in the general case, but you need a handle in the few cases where it doesn't (even if just temporarily).