conda-store icon indicating copy to clipboard operation
conda-store copied to clipboard

API modification: add ability to fetch uniquely named packages; return list of versions available for each

Open peytondmurray opened this issue 3 years ago • 4 comments

Current Behavior

Right now, the /api/v1/package/ endpoint fetches the packages which are available for the user to install. Currently, I can control the list of packages returned to me by using the appropriate query parameters. For example, if I send a GET to /api/v1/package/?page=1&size=10&distinct_on=name&distinct_on=version&sort_by=name, the response is a list of results which have distinct names and versions:

{
    "count": 109441,
    "data": [
        {
            "build": "py38h9a4a7a8_1",
            "channel_id": 2,
            "id": 101824,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "383a4dde58ca57811108d44cde454e04d6ac861e77e5f200e8dad803f863d914",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.0.2"
        },
        {
            "build": "py39h20ed36d_1",
            "channel_id": 2,
            "id": 101833,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "545806eb2664ead4becc0ab147d54c7d8dbc523cd0b3ce7d7481c99430506c2e",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.0.3"
        },
        {
            "build": "py36h29bcdb0_0",
            "channel_id": 2,
            "id": 101835,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "33b624be3076a788e12853df26c19058a37b832dc69839a73378488c0a208788",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.1.1"
        },
        {
            "build": "py36h29bcdb0_0",
            "channel_id": 2,
            "id": 101839,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "d8c9436f006009a7bd64b3fde5253a4818f33510a32a036159198dcce7880ddb",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.1.2"
        },
        {
            "build": "py38ha5b31ff_0",
            "channel_id": 2,
            "id": 324514,
            "license": "MIT",
            "name": "21cmfast",
            "sha256": "4412925b69a7aa627374446eaa4e942c5626732e018e92a2574ea6a9e7d14234",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "version": "3.1.3"
        },
        {
            "build": "hbb7d975_1",
            "channel_id": 2,
            "id": 101847,
            "license": "Public Domain",
            "name": "2dfatmic",
            "sha256": "3b80e7812c33f825a20ccec3b8d16655de1b396a3b8dc71844a80c3b2823a9b7",
            "summary": "Two-Dimensional Subsurface Flow, Fate and Transport of Microbes and Chemicals Model",
            "version": "1.0"
        },
        {
            "build": "h618b193_0",
            "channel_id": 2,
            "id": 101850,
            "license": "GPLv2+",
            "name": "4ti2",
            "sha256": "d9f122bbb25d291391f1b4438e556ccee350e2487bde1fd3942d3577dcee8f42",
            "summary": "A software package for algebraic, geometric and combinatorial problems on linear spaces",
            "version": "1.6.9"
        },
        {
            "build": "pyh9f0ad1d_0",
            "channel_id": 2,
            "id": 27962,
            "license": "GPL-3.0-or-later",
            "name": "aadict",
            "sha256": "43e3e090dde8469e2514e1526ea446b9338a2204ffe94e51b51ac404d376447e",
            "summary": "An auto-attribute dict (and a couple of other useful dict functions)",
            "version": "0.2.3"
        },
        {
            "build": "pyhd8ed1ab_0",
            "channel_id": 2,
            "id": 27963,
            "license": "Apache-2.0",
            "name": "aalto-boss",
            "sha256": "86640eb12bba8927475c1356bf6a19211331437c21589da314253c8ffc662098",
            "summary": "Bayesian optimization structure search",
            "version": "1.1"
        },
        {
            "build": "pyhd8ed1ab_0",
            "channel_id": 2,
            "id": 27964,
            "license": "Apache-2.0",
            "name": "aalto-boss",
            "sha256": "cf886fdd2605e679c68a9209e565ca8457d3d17d71a93ce8956cda4d5827d5a9",
            "summary": "Bayesian optimization structure search",
            "version": "1.2"
        }
    ],
    "page": 1,
    "size": 10,
    "status": "ok"
}

For conda-store integration in Gator this is not optimal, because packages can have many versions; for each single fetch request, only a few distinct packages are returned, with most of the results just different versions of the same package.

Proposed behavior

Instead of treating distinctly-versioned packages as separate treat them as a single package, and return a list of installable versions for each uniquely named package:

{
    "count": 109441,
    "data": [
        {
            "channel_id": 2,
            "license": "MIT",
            "name": "21cmfast",
            "summary": "A semi-numerical cosmological simulation code for the 21cm signal",
            "versions": ["3.0.2", "3.0.3", "3.1.1", "3.1.2", "3.1.3"]
        },
        {
            "channel_id": 2,
            "license": "Public Domain",
            "name": "2dfatmic",
            "summary": "Two-Dimensional Subsurface Flow, Fate and Transport of Microbes and Chemicals Model",
            "versions": ["1.0"]
        },
        {
            "channel_id": 2,
            "license": "GPLv2+",
            "name": "4ti2",
            "summary": "A software package for algebraic, geometric and combinatorial problems on linear spaces",
            "versions": ["1.6.9"]
        },
        {
            "channel_id": 2,
            "license": "GPL-3.0-or-later",
            "name": "aadict",
            "summary": "An auto-attribute dict (and a couple of other useful dict functions)",
            "versions": ["0.2.3"]
        },
        {
            "channel_id": 2,
            "license": "Apache-2.0",
            "name": "aalto-boss",
            "summary": "Bayesian optimization structure search",
            "versions": ["1.1", "1.2"]
        },
    ],
    "page": 1,
    "size": 10,
    "status": "ok"
}

I haven't yet worked out the implications for cases where different versions of a package are provided on different channels, or handling different builds, but I just wanted to get the discussion going. I think if the user wants more information about a specific package (individual version information, for example), they can use the search query parameter on the /api/v1/package/ endpoint. This change would greatly improve the user experience for browsing packages with the Gator JupyterLab extension.

peytondmurray avatar Dec 15 '21 23:12 peytondmurray

Thanks for this issue @peytondmurray. Totally agree that this is an important thing for the conda-store api to return. Thinking of how we would like the api to return results. I like your proposed behavior. Thinking of how this can be done efficiently within a database.

costrouc avatar Dec 22 '21 20:12 costrouc

I don't want to forget about this, and I'm interested in trying to implement this. I'll give it a shot this weekend and report back here.

peytondmurray avatar Jan 20 '22 23:01 peytondmurray

I'm going to give some pushback on this feature. I've been working on implementing it. Suppose the endpoint returns:

{
   license: "...",
   name: "...",
   channel_id: "...",
   summary: "...",
   versions: [
         ["package_id", "version_str"],
         "..."
   ]
}

A few of my concerns:

  • what is the endpoint going to be? /package/ is already taken and Id prefer to not add another not totaly orthogonal endpoint.
  • this is tricky to implement database side (as in sqlalchemy supported by all databases, easy to do for one database but not all) and pagination doesn't mean much if anything here because how should we limit queries? By number of package names? But what if a package has 100+ versions this means the api could return 100*N results.
  • this can be easily implemented client side

I agree that right now the api is not fast enough making this take awhile. I need to speed up these queries.

costrouc avatar Apr 22 '22 13:04 costrouc

Got more color in our meeting today. We will limit the scope to individual packages.

E.g. /api/v1/package/<channel-identifier>/<package-name>

costrouc avatar Apr 25 '22 15:04 costrouc

@costrouc is this still valid?

kcpevey avatar Jul 06 '23 13:07 kcpevey

The API changed since the issue was open, and the performance were greatly improved since then.

Do we still want this feature of listing versions of a given package ?

pierrotsmnrd avatar Aug 18 '23 15:08 pierrotsmnrd

perhaps not - we can close and if needed we can reopen or make a better scoped issue

trallard avatar Aug 18 '23 17:08 trallard

Yeah I agree on closing this issue. When @pierrotsmnrd significantly improved the performance of the package api I don't think there is as much of a use case. Since you can just normally query the api.

costrouc avatar Aug 24 '23 22:08 costrouc