Adding conda-forge to repology
Repology is a service that tracks the versions of packages in different Linux Distros. While conda-forge is not really a Linux Distro, it would still be interesting to have it added to that list both to raise awareness about conda-forge and generate interest in keeping things up-to-date.
cc @CJ-Wright
Appears there is an open issue for this.
xref: https://github.com/repology/repology/issues/518
Upstream closed the xref'd issue because there isn't a good way for them to determine how packages in conda-forge relate to those on other package indexes. They cited their requirements in this comment, which are further elaborated in this comment.
This is also being followed up in bioconda. ( https://github.com/bioconda/bioconda-utils/issues/400 )
@wolfv @dholth @jaimergp, wonder if any of you have thoughts on this one? More generally how could we make our repodata more accessible to 3rd party services (like Repology)?
There are some parts in the CZI grant that go in this direction. Specifically the consolidation of data sources for the bot infrastructure. If that happens through a database server (to be studied and decided, only an idea), we could think of exporting the needed metadata for repology as a view of the database.
The prefix.dev web panel might also have enough data at this point.
But IIRC, repology expects some namespacing (like python-) we might need to infer from the host dependencies and there might be some corner cases.
Thanks for the update Jaime! 🙏 Was wondering if there was overlap with existing efforts
Don't think repology cares about that so much as a way to consistently map between our naming and the canonical Python names. This is something we are admittedly lacking and may want ourselves for other reasons.
We can check if the recipe has a pypi url and reverse-engineer it from there? I see from the other ticket that repology would like one big mapping, and not the huge source database "every package's index.json / meta.yaml / etc." of which we have several. Oh, this is pretty good https://github.com/regro/cf-graph-countyfair/tree/master/mappings/pypi ... https://github.com/repology/repology-rules/blob/master/README.md ? Maybe they need a pull request with the matcher?
I'm not entirely clear from the repology tickets, exactly what is missing from the conda data.
Good point Daniel! Yeah that seems more like what they are looking for.
Noticed importlib_metadata is missed. So we may be missing a few. Still this seems approachable.
Maybe it is worth sharing this with them and see if that would work for their needs?
Matt mentioned this site and falls under the same scope of "exporting metadata for integration with external webservices": https://clearlydefined.io/. It's supposed to help with licenses and stuff, I think.
I am taking a look at this again. IIUC, this is what we need:
- A dataset can be easily fetched (ideally single request, like
channeldata.jsonor our multiplerepodata.jsons). This is done. - A reliable way of identifying Python "modules" (i.e. not "applications"). I think this means that we need to correctly identify all PyPI packages in conda-forge. This is not done yet.
As pointed out above, this could be done via the countyfair mappings, but that mapping relies on either source URLs from PyPI (not reliable enough according to the author) or a custom extra.mappings.python.pypi field in meta.yaml. We are not using this last option at all, but it's in the bot, and we could require it as part of the new staged-recipes contributions.
We would need to run some analysis in all feedstocks to check which packages are indeed in PyPI even if the URL doesn't indicate so, and add the extra field (around 2k packages, it seems?). Maybe this requires a bot and some human intervention.
Note: The Python "module" definition is a bit arbitrary now that there are things like
cmakein PyPI 🤷
Just stumbled across this today (not for the first time) here.
I it would probably make sense to split the display for conda-forge at least per OS (e.g. conda-forge (linux), conda-forge (osx), conda-forge (win); possibly even per arch). That might also make the metadata aggregation a bit simpler?