packages
packages copied to clipboard
[Proposal] Put each nimble package in its own file
Proposal
Put each nimble package in its own file/directory, or let each user create a subdirectory for themselves and put each package into one file.
Benefits
- User only has to add a file to the repo instead of editing one big file. Multiple users can add packages without conflicts
- Parser won't eventually be overwhelmed by packages (since Nim and nimble will be super popular one day!)
- Better git history
- If each user has their own directory, packages can be found easily by the same user (not that it isn't easy already with nimble search, but just browsing the github repo could be a fun discovery)
Downsides
- Lots of files. But git can handle it.
- Higher code complexity. Must write recursive directory walker and gather packages.
So what do you think? It's just an idea, but maybe it'll be of use?
You are talking about packages.json, right?
@Araq yep!
Sure, we can do this :)
Parser won't eventually be overwhelmed by packages This is not correct. The most common searches are by package name, tag and words from the description. Indexing the packages require scanning the whole packages.json file. With separate files this would require recursive walks and parsing a large number of manifest files which is even slower.
The usual solution for large archieves is to precompute index files centrally (essentially a keyword -> package name k/v map) and ship those. Yet, we are not going to need this for quite a while!
@FedericoCeratto You make a good point about the package scanner and it taking more time to do recursive walks, however, the main point (which I failed to emphasize) is the greater ease of use.
It's not happening so much now, but when lots of users start submitting lots of packages to Nim, there are potentially going to be many PRs for packages.json and many conflicts to resolve either by the user or by @dom96 and yourself, or whoever is maintaining it in the future.
Again, this is just an idea. If it makes things too difficult or my reasoning is faulty, maybe package.json fine the way it is. I haven't looked at the source, so maybe this is the most efficient way.
One potential drawback with this proposal is that you won't be able to as easily add a new package using just the GitHub web interface.
@dom96 that's true as well. It's pretty easy to just add a package like that.
You can still fairly easily add new packages using the GitHub UI by clicking Create new file, you just have to then copy the JSON (or whatever) structure into the editor.
@euantorano that's a good point as well. I'm still a proponent of this change and I'll look into the effort when I have some time.
@jyapayne I'll create a proposal on the Nimble issue tracker for a package publishing service.
@FedericoCeratto that would be equally awesome!
packages.json is getting big...
Maybe just a single big directory were each package is a file?
If we have sub directories package names can conflict without people knowing.
you won't be able to as easily add a new package using just the GitHub web interface.
Do people do that? :)
I like the point one package per directory (or may be even a file?) rather than having to edit a humongous .json.
proposal:
- write a tool that splits existing json into its individual array elements (ie 1 per package), and saves them as mypkg.json for a package called
mypkg - the existing json could then be auto-generated (ie, either not checked in, or checked in but updated after each commit using a git commit hook), from individual mypkg.json files in that repo btw, so it’ll make migration easy (ie existing tools that depend on that single json file would continue to work)
note:
checking it in might make it simpler, eg so that nimble refresh only has to download 1 file
file organization
packages/mypkg1.json # all pacakges go here
blacklist.json # ignore packages specified here
autogenerated_list.json # large auto generated package list
One potential drawback with this proposal is that you won't be able to as easily add a new package using just the GitHub web interface.
Actually you can create files through web interface.
You can also create a folder structure using tab though its not intuitive.
Problems
- The packages.json file is getting bigger. It could become slow to parse. It is awkward to edit. It requires contributors to have a GitHub account. It requires PR approval to prevent package hijacking.
- sometimes GH repos ar deleted, and some countries block GitHub, and GitHub refuses to serve some other countries.
- Implementing a Nim distribution: https://github.com/nim-lang/RFCs/issues/173
Proposal
Precompute index files centrally: a <keyword> -> <package name> k/v map, to look up packages by keyword/tag; Also a <package name> -> <package metadata> map. Ship the two indexes as binary/compressed files for fast transfer and fast lookup time.
Run a simple service similar to pypi.org to handle package creation/update and generate the indexes. Initially it could feed from GH and/or use GH as a backend to store the indexes.
Future goals Store compressed tarballs of released packages. This is useful in case of dead repos, and in countries that block GitHub, and countries that GitHub refuses to serve. Check URL / git repo existence before accepting a new package.
Moonshot goals Let package owners sign metadata. Also use the signature to allow allow owners to update/delete packages without having to store logins and passwords. Verify signed tarballs from GH (and other sources) against the owner pubkey. A pool of "admin" pubkeys is allowed to update/delete other packages. A pool of "contributor" pubkeys can vet trusted packages by adding a "vote +1" signature. Nimble can warn before installing unvetted packages. This implements most of the building blocks for a Nim distribution described in https://github.com/nim-lang/RFCs/issues/173
Update from a conversation with Araq: A small database is preferred over a directory because it can be: downloaded easily over plain http, replaced atomically on disk, checksummed to verify integrity, signed. SQLite is supported by the stdlib, has a stable format, works cross-platform, has really good lookup timing.
A small database is preferred over a directory because it can be: downloaded easily over plain http, replaced atomically on disk, checksummed to verify integrity, signed. SQLite is supported by the stdlib, has a stable format, works cross-platform, has really good lookup timing.
Nothing is simpler than a directory structure. SQLite is a dependency and I would not wish to make Nimble depend on it.
Everything else sounds good to me on the surface. It'll be the details of how you implement this that I may disagree with, but just go for it :)
Any progress here? I'm learning Nim but found the package list is pretty ugly. Maybe we can separate it to single files in an other branch (using a script for migration), and compile them into a single json file using CI for backward compability. Using git or tarball to fetch the package repo is preferred by me though.