repology-updater
repology-updater copied to clipboard
Mark automatically imported packages and treat specially
Some repositories are fully (sisyphus autoimports, not currently used) or partially (haskell, r modules in nix) automatically imported. We need a way to mark these and treat them specially:
- Exclude from statistics. Comparing them with manually maintained and tested packages is just not fair, and statistics artifically changed by the size of some large module collection(s) becomes meaningless.
- Exclude from counting spread. Package automatically duplicated in additional repository does not make it more widely spread. It also should not be unshadowed.
I'm not sure about this. At least on NixOS, Haskell packages are tested by running their test suites. Automated or not, these are packages that NixOS distributes.
Running tests guarantee nothing as tests are never complete. These packages have zero added value: nobody have reviewed them, confirmed that they actually work and are compatible with consumer code, at the very least that they don't contain malicious code. There's no value for repology either, as they just duplicate information from another source.
Just got an insight that we can probably just use forced rolling
status for these. Probably aliased to generated
in ruleset.
I agree with the statement that such packages should be marked appropriately in such explicit cases.
But I can not understand where the information will come from in the opposite cases,— information that the package has been fully tested, reviewed, compatible with the consumer code, and don't contain something harmful? Do you propose to rely on the authority of major known distributions?
There's no source for this information. Human maintained packages are considered 'good enough'.
Handling it as rolling
should be enough for now. Applied to nix.
Let's get back to this. While we still need to treat generated packages specially (at the very least, exclude from statistics calculation), these are still normal comparable packages which may be outdated, so handling them as rolling
is not a good solution.
Instead, we could introduce additional PackageFlag (e.g. AUTOIMPORTED) and exclude package from statistics based on it (we can still count it if it's outdated though).
Also need to ignore R modules in nix.
The working code is there, but the question is (as always) how to display this to users. We already have a lot of "total" counts:
- number of packages
- number of projects
- number of non-unique projects and we're going to add yet another one:
- number of projects which are not known as automatically imported :astonished:
Need to trim this down to 2 or 3 most representative values and use them consistently. Also need docs with justification (#852).
While I agree that there should be a marker for auto-generated packages, here are a few arguments for counting Nixpkgs' auto-generated packages as "packaged":
- The packages go through roughly the same QA process as regular packages. Just because bumps happen using automation and in bulk doesn't mean the maintainers don't look at them. Inversely, just because we have "manual" maintainers on other packages, doesn't mean they're well maintained. If you took the average, say, python package ("manual" maintenance) and compared it to the average Haskell package (semi-automatic maintenance), you'd likely find them to work about equally well (or not but also about equal).
Individually maintained packages are also often auto-updated in Nixpkgs thanks to @r-ryantm. Does that mean these packages are also of lesser value? I don't think you can really make that assertion in either case. - They add value to the distro. Specifically for Nix, you wouldn't be able to run any of them without them being "properly" packaged because the language-specific package managers won't work. You also wouldn't be able to use them in declarative environments which is, like, the whole point of Nix.
If Arch had a similar auto-generated set of packages that could be managed via pacman, I'd count those packages towards Arch's number too. - The Haskell packages in Nixpkgs are "manually" patched to behave, see: https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/haskell-modules/configuration-common.nix https://github.com/NixOS/nixpkgs/blob/master/pkgs/development/haskell-modules/configuration-nix.nix. It's not a "run it and it's done" kind of process that's automatically done every so often, it's a manual task that makes use of great automation and is done in batches rather than individually.
I think there should perhaps be three categories of packages then:
- Manual (low degree of automation, individually maintained)
- Semi-automatic (high degree of automation, bulk maintained manually)
- Automatic (high degree of automation, no real manual maintenance)
I'd count manual and semi-automatic packages towards a distro's packages.
Nixpkgs Haskell packages would squarely land in semi-automatic while i.e. our texlive packages would qualify as fully automatic (Repology does not know about these yet but we have imported the entirety of CTAN. Other distros such as Mandriva likely also use auto-imports here.)
I agree on all this, and in fact I've dropped the idea of marking autoimported packages specially. This issue should've been closed long time ago.
I actually still think fully auto-generated packages should at the very least be marked and probably even excluded from total numbers but I'd also be fine if that weren't the case.
As far as I can tell, the impact is currently limited with ctan import in nixpkgs (in near future) and openmandriva, and there are a few thousand packages. I won't bother for these, at least for now.