conda-forge.github.io icon indicating copy to clipboard operation
conda-forge.github.io copied to clipboard

Adding conda-forge to repology

Open jakirkham opened this issue 7 years ago • 11 comments

Repology is a service that tracks the versions of packages in different Linux Distros. While conda-forge is not really a Linux Distro, it would still be interesting to have it added to that list both to raise awareness about conda-forge and generate interest in keeping things up-to-date.

cc @CJ-Wright

jakirkham avatar Jun 01 '18 06:06 jakirkham

Appears there is an open issue for this.

xref: https://github.com/repology/repology/issues/518

jakirkham avatar Jun 05 '18 03:06 jakirkham

Upstream closed the xref'd issue because there isn't a good way for them to determine how packages in conda-forge relate to those on other package indexes. They cited their requirements in this comment, which are further elaborated in this comment.

jakirkham avatar Feb 11 '19 08:02 jakirkham

This is also being followed up in bioconda. ( https://github.com/bioconda/bioconda-utils/issues/400 )

jakirkham avatar Feb 11 '19 08:02 jakirkham

@wolfv @dholth @jaimergp, wonder if any of you have thoughts on this one? More generally how could we make our repodata more accessible to 3rd party services (like Repology)?

jakirkham avatar Jan 19 '23 02:01 jakirkham

There are some parts in the CZI grant that go in this direction. Specifically the consolidation of data sources for the bot infrastructure. If that happens through a database server (to be studied and decided, only an idea), we could think of exporting the needed metadata for repology as a view of the database.

The prefix.dev web panel might also have enough data at this point.

But IIRC, repology expects some namespacing (like python-) we might need to infer from the host dependencies and there might be some corner cases.

jaimergp avatar Jan 19 '23 15:01 jaimergp

Thanks for the update Jaime! 🙏 Was wondering if there was overlap with existing efforts

Don't think repology cares about that so much as a way to consistently map between our naming and the canonical Python names. This is something we are admittedly lacking and may want ourselves for other reasons.

jakirkham avatar Jan 19 '23 20:01 jakirkham

We can check if the recipe has a pypi url and reverse-engineer it from there? I see from the other ticket that repology would like one big mapping, and not the huge source database "every package's index.json / meta.yaml / etc." of which we have several. Oh, this is pretty good https://github.com/regro/cf-graph-countyfair/tree/master/mappings/pypi ... https://github.com/repology/repology-rules/blob/master/README.md ? Maybe they need a pull request with the matcher?

I'm not entirely clear from the repology tickets, exactly what is missing from the conda data.

dholth avatar Jan 19 '23 20:01 dholth

Good point Daniel! Yeah that seems more like what they are looking for.

Noticed importlib_metadata is missed. So we may be missing a few. Still this seems approachable.

Maybe it is worth sharing this with them and see if that would work for their needs?

jakirkham avatar Jan 20 '23 07:01 jakirkham

Matt mentioned this site and falls under the same scope of "exporting metadata for integration with external webservices": https://clearlydefined.io/. It's supposed to help with licenses and stuff, I think.

jaimergp avatar Jul 11 '23 06:07 jaimergp

I am taking a look at this again. IIUC, this is what we need:

  • A dataset can be easily fetched (ideally single request, like channeldata.json or our multiple repodata.jsons). This is done.
  • A reliable way of identifying Python "modules" (i.e. not "applications"). I think this means that we need to correctly identify all PyPI packages in conda-forge. This is not done yet.

As pointed out above, this could be done via the countyfair mappings, but that mapping relies on either source URLs from PyPI (not reliable enough according to the author) or a custom extra.mappings.python.pypi field in meta.yaml. We are not using this last option at all, but it's in the bot, and we could require it as part of the new staged-recipes contributions.

We would need to run some analysis in all feedstocks to check which packages are indeed in PyPI even if the URL doesn't indicate so, and add the extra field (around 2k packages, it seems?). Maybe this requires a bot and some human intervention.

Note: The Python "module" definition is a bit arbitrary now that there are things like cmake in PyPI 🤷

jaimergp avatar Aug 23 '23 09:08 jaimergp

Just stumbled across this today (not for the first time) here.

I it would probably make sense to split the display for conda-forge at least per OS (e.g. conda-forge (linux), conda-forge (osx), conda-forge (win); possibly even per arch). That might also make the metadata aggregation a bit simpler?

h-vetinari avatar Mar 07 '24 00:03 h-vetinari