packaging-problems icon indicating copy to clipboard operation
packaging-problems copied to clipboard

Request: PEP to describe current Warehouse JSON API

Open brainwane opened this issue 5 years ago • 44 comments

Desired: a PEP to describe the current Warehouse JSON API, to:

  • lock in the existing standard as a guarantee for consumers (client applications like pip, pipenv, and more)
  • help other indexes (such as devpi and pypiserver and Artifactory) to implement the standard and be assured of interoperability

Does anyone want to volunteer to do this? It might take 15-20 hours total and would help a lot of folks out.

Stuff to cover

Includes:

  • how we version our APIs
  • important error codes or the lack thereof, and how the API deals with malformed requests or requests for nonexistent projects or releases
  • what fields are guaranteed to be returned
  • how we deal with capital letters
  • stuff PEP 503 reminds you of

Also, the current JSON API has some flaws, and so as we document it, that's an opportunity to find out what people expect, how the designers expected it to be used, what people need and want, etc.. But do not think your job is to fix those things. Your job is to log those things and document the existing state.

Background

Today in IRC @dstufft, @techalchemy, and I discussed:

@brainwane: I think some developers of client apps have basically said "what is the risk of depending on the current Warehouse JSON API?" and have heard confusing things back from the main Warehouse developers. Like "we're gonna change the JSON API at some point in the future but...." and then have wildly different estimates of how long it will be till then. And Warehouse maintainers haven't had the resources, the dedicated time, to collaborate with contributors who want to help develop the new JSON API, review branches, etc. So this has left everyone in a state of uncertainty, so that developers of clients of Warehouse's current API have a hard time making engineering tradeoff decisions ("it's ok to write this and know we'll have to rewrite it in a year") .

@dstufft said that interoperability depends on standardizing the Warehouse API:

@dstufft: if you want to work with things that aren't warehouse, you can't rely on the JSON API really, because it's non standard and support for it varies @techalchemy yeah it's pretty nuanced, other indexes would have to implement the entire data model of warehouse Donald: if you only care about working with pypi itself, then the JSON API is fine Dan: and/or at least parse wheel dependencies to json Donald: it's almost certainly not going away Donald it will probably get deprecated whenever/if we ever get a next gen api

But it may be several months if not more before volunteers can design and implement a next-generation Warehouse JSON API. So how can we help consumers and peers of the current Warehouse API work with what currently exists? We agreed that it would be a good interim step to document the current API in a PEP. We estimate "documenting what exists today is probably a less than 10 hours of total work" for the initial draft.

I would like to see this written and accepted within a few months. But as @techalchemy notes: "Even if it's not accepted, as long as it makes it to draft status it can generally start being useful. A draft PEP means pip can move toward supporting something which will force adoption ...will force other index server implementations to start considering building support."

Checklist

  • [x] First, someone needs to volunteer to be the lead author on this PEP.

  • [ ] Warehouse developers fix pypa/warehouse#8090.

  • [ ] Author talks in this thread with @brainwane about a reasonable deadline and a schedule for interim checkins to write the first draft, and gets the thread on Discourse started.

  • [x] Author makes first PEP draft as a sketched-in outline

  • [ ] Author submits as a Work-In-Progress PR to the python/peps repo, and circulates on distutils-sig/discuss.python.org for comment, and to maintainers of Warehouse clients and other indexes, and revises in response to their comments

  • [ ] Author finalizes PEP and gets PEP accepted by BDFL-Delegate

  • [ ] We all celebrate! Now we have a standard that all the clients can feel guaranteed about using and that other indexes can implement!

After this After this, we can get some dedicated volunteer time committed, from a Warehouse expert, to write the next-gen JSON API PEP (this will be a substantial task, and I'm pretty wary of getting a grant for it because it's way more research than development), and then get that discussed and accepted, and then apply for and get money to implement it.

brainwane avatar Jun 10 '20 22:06 brainwane

@webknjaz and @hugovk, you both came to mind as people who might like to do this task -- feel free to speak up if it's something you'd be interested in doing.

brainwane avatar Jun 10 '20 22:06 brainwane

@brainwane I haven't worked w/ that API myself and haven't written any PEPs ever. So I don't feel confident about starting this.

P.S. Your mention of #8090 is not clickable because it's a cross-repo link. I guess you can change it to pypa/warehouse#8090

webknjaz avatar Jun 11 '20 07:06 webknjaz

Thanks also for the suggestion, I've used it the API a little bit but am in a similar situation, and also don't think I've the time to fully commit to it.

hugovk avatar Jun 12 '20 13:06 hugovk

@webknjaz link fixed, thanks.

@hugovk and @webknjaz - thanks for clearly saying no so we can move on and ask other people. :-)

@dholth and @cooperlees -- are either of you interested in taking this on, maybe part of it (see the checklist in the initial post for things that need doing)?

brainwane avatar Jun 25 '20 04:06 brainwane

@dholth and @cooperlees -- are either of you interested in taking this on, maybe part of it (see the checklist in the initial post for things that need doing)?

I want this, so I'll commit to this. I'll even help with the new API design and possible implementation. My only warning here is my english skills are bad and this will be my first PEP.

I'll put aside some time Sunday night to try get the sketch outline started and possible we can chat on IRC more Monday @brainwane .

cooperlees avatar Jun 25 '20 15:06 cooperlees

To clarify for my benefit - is the intention here to define a standard that all indexes must implement (in the same sense that PEP 503 covers the simple API) or to define and document how Warehouse (PyPI) operates?

The former would mean that we intend to allow tools to assume the existence of such an API and would mandate that all index implementations (devpi, pypiserver, Artifactory) should implement it¹ And IMO it would mean that we should be collecting input from the developers of those implementations as well as Warehouse.

If it's intended simply to document the Warehouse API as "the reference implementation of a JSON API" then it's not so much of an interoperability standard and we can avoid those complexities (although conversely, it would be of more limited use for general tools like pip).

¹ Yes, it could be defined as an optional API, in which case we'd need a means of querying "do you support this?"

pfmoore avatar Jun 25 '20 15:06 pfmoore

@pfmoore I'd say all indexes, which is why one of the items on the checklist is

Author submits as a Work-In-Progress PR to the python/peps repo, and circulates on distutils-sig/discuss.python.org for comment, and to maintainers of Warehouse clients and other indexes, and revises in response to their comments

brainwane avatar Jun 25 '20 15:06 brainwane

But also I'll defer to folks like @dstufft @techalchemy @Julian @mplanchard @fschulze on Paul's question.

@cooperlees:

I want this, so I'll commit to this.

@cooperlees in saying this, you are a model of an open source citizen. Thank you. :-)

I'll even help with the new API design and possible implementation. My only warning here is my english skills are bad and this will be my first PEP.

Everybody's got a first time. :-) And I know other folks will help with refining the prose.

I'll put aside some time Sunday night to try get the sketch outline started and possible we can chat on IRC more Monday @brainwane .

Sounds great!

I think we'll also need a PEP sponsor. @jaraco @cjerdonek @zooba @gpshead @merwok are any of you open to sponsoring this?

brainwane avatar Jun 25 '20 15:06 brainwane

Cool - sorry, I missed that item. I presume that @dstufft would be BDFL-Delegate for this.

pfmoore avatar Jun 25 '20 15:06 pfmoore

I would say I don't think all repositories have to implement it, but rather the goal would be to standardize it so that tooling can say "I depend on a repository that implements this API", repositories are of course free to say they don't support that API, but those tools won't work with them then. We should def try to get feedback from them though, if the answer from all of them is they can't or won't implement it, then maybe we need to think harder about the path forward for it.

One key thing I think I'd want to see in a PEP for this, is trying to explicitly document what use cases we're trying to make the current JSON API good for. Are we looking to just standardize it to function as a general purpose "pull data from PyPI" API, or are we looking to allow specialized tooling to use it for some purpose (as an example, do we want Bandersnatch to be able to use this for implementing mirroring? Does the current "shape" of the API allow that? If not what's our smallest change we can make to allow that? etc).

dstufft avatar Jun 25 '20 16:06 dstufft

https://github.com/devpi/devpi/issues/801 is an example of something to look at to figure out why they want this API standardized and to make sure our API actually satisfies their use case.

dstufft avatar Jun 25 '20 16:06 dstufft

Thanks Donald - sorry for misremembering and thanks for the correction.

brainwane avatar Jun 25 '20 16:06 brainwane

I'd also like the PEP to have a clear way for clients to query servers as to whether they support this API. Just trying a query and checking the response runs the risk of people exposing a "similar, but not the same" API and clients having no way of knowing.

pfmoore avatar Jun 25 '20 17:06 pfmoore

@havocp and @tiegz and @katzj -- if you use Warehouse's JSON API in https://github.com/librariesio/bibliothecary/ or in the Tidelift CLI tool then check out this question:

One key thing I think I'd want to see in a PEP for this, is trying to explicitly document what use cases we're trying to make the current JSON API good for. Are we looking to just standardize it to function as a general purpose "pull data from PyPI" API, or are we looking to allow specialized tooling to use it for some purpose (as an example, do we want Bandersnatch to be able to use this for implementing mirroring? Does the current "shape" of the API allow that? If not what's our smallest change we can make to allow that? etc).

brainwane avatar Jun 29 '20 15:06 brainwane

From the devpi side a big requirement for an API are relative links from a common root, because we support multiple indexes. Besides that I don't have much input at this point.

fschulze avatar Jun 30 '20 06:06 fschulze

I have an extreme draft up on my PEP fork here: https://github.com/cooperlees/peps/blob/warehouse_json_api/pep-9999.rst

What's the best way to have everyone be able to comment + add to? Should we use a Google doc and I transfer back to the rst? Is there a better way?

From the devpi side a big requirement for an API are relative links from a common root, because we support multiple indexes. Besides that I don't have much input at this point.

Do you mean for the releases and urls section "url" ? Wouldn't you just put an absolute URL using your domain? Can you maybe give me an example on how you'd use a relative URL and I'll maybe understand your use case better.

cooperlees avatar Jul 13 '20 14:07 cooperlees

@cooperlees the current json API is at https://pypi.org/pypi/[projectname]/json, tools like https://github.com/peterbe/hashin/ often hardcode that absolute URL. So even though it is possible to provide an alternate URL, it will always start with /pypi. With devpi there are many indexes. Each user can create several of the form https://example.com/username/indexname and it isn't possible for devpi to provide the PyPI json API for tools like that, because each index needs its own endpoint, for example https://example.com/username/indexname/+json (the + in there is to distinguish from project names which live at https://example.com/username/indexname/projectname/. That is what I mean with relative vs absolute URL endpoints. I hope I was able to describe it properly.

fschulze avatar Jul 13 '20 14:07 fschulze

@cooperlees the current json API is at https://pypi.org/pypi/[projectname]/json, tools like https://github.com/peterbe/hashin/ often hardcode that absolute URL. So even though it is possible to provide an alternate URL, it will always start with /pypi. With devpi there are many indexes. Each user can create several of the form https://example.com/username/indexname and it isn't possible for devpi to provide the PyPI json API for tools like that, because each index needs its own endpoint, for example https://example.com/username/indexname/+json (the + in there is to distinguish from project names which live at https://example.com/username/indexname/projectname/. That is what I mean with relative vs absolute URL endpoints. I hope I was able to describe it properly.

Ahh got it. Here I'd love to propose (in my PEP) that we keep the legacy URLs on PyPI (for legacy reasons) but in the standard move something like (and implement on PyPI - I will happily do that):

  • /json: Shows API version
  • /json/discover/$call_name: Paul Moore's request
  • /json/project/$project_name: pypi.org/project/$name like API
  • /json/p/$project_name: Alias of the above like /project

Totally open to better ideas, but something like this will allow you to get your per Index JSON API :)

cooperlees avatar Jul 13 '20 15:07 cooperlees

Ok - I finally sat down and described all the JSON fields I could decipher what they are for.

Returned JSON fields I need help with:

  • info.platform
  • releases.VERSION.has_sig

Branch is here: https://github.com/cooperlees/peps/tree/warehouse_json_api

What's left to do before I put up a pull request for review more PEP savvy people? How do I get a PEP number etc.

I still expect this needs a lot of refinement, but I'm getting to the limits of my knowledge of the API from just using it. I think the best way forward is possibly having PyPI maintainers all take a pass at cleaning it up. I think I've done the grunt of the boring manual reading JSON files and trying to workout all fields we should make required etc.

Thanks! Looking forwarding to closing this one out.

cooperlees avatar Aug 22 '20 17:08 cooperlees

https://github.com/cooperlees/peps/blob/warehouse_json_api/pep-9999.rst

For anyone else trying to get to the PEP quickly. :P

pradyunsg avatar Aug 22 '20 18:08 pradyunsg

What's left to do before I put up a pull request for review more PEP savvy people? How do I get a PEP number etc.

Brett recently answered a few questions related to this over on discuss.python.org.

In terms of the process, I think you'll also want to file a PR to packaging.python.org -- adding a page to https://github.com/pypa/packaging.python.org/tree/master/source/specifications detailing the final design that folks use/implement.

From https://discuss.python.org/t/how-to-propose-new-specs/4721/7?u=pradyunsg:

The way that I understand the situation is:

  • the PEP contains all the information like "Why did we do <thing-we-settled-on> and not <different-thing>"
  • the PR to packaging.python.org adds a page that describes <thing-we-settled-on>

pradyunsg avatar Aug 22 '20 19:08 pradyunsg

I still think we should clarify the location of the API to not make it pypi.org centric. I would propose that the base for PyPI be defined as https://pypi.org/json and that all other endpoints like /json/discover/$call_name are redefined from that base, i.e. $base/discover/$call_name. It should also be made clear that tools should strive to offer a way to configure the base to be usable with non PyPI package indexes like devpi.net

fschulze avatar Aug 23 '20 06:08 fschulze

I still think we should clarify the location of the API to not make it pypi.org centric. I would propose that the base for PyPI be defined as https://pypi.org/json and that all other endpoints like /json/discover/$call_name are redefined from that base, i.e. $base/discover/$call_name. It should also be made clear that tools should strive to offer a way to configure the base to be usable with non PyPI package indexes like devpi.net

That's the main intent and why I added the /json URLs on the PEP. Please feel free to suggest wording changes to make it clearer. I am a terrible writer. Just doing this cause I want the functionality, not cause I like writing. I actually dislike it a lot, so would appreciate ALL help I can get.

I think tools is scope creep for this PEP. This PEP is to just make a standard designed API so we can all implement it the same. Once we have that we should request tools to support it - i.e. different base Index URLs ... like pip can today.

cooperlees avatar Aug 23 '20 15:08 cooperlees

@kpfleming I think, based on https://discuss.python.org/t/pep-for-the-python-package-index-json-api/5717/16 , that you might want to check in here and give @cooperlees some feedback on the current draft.

brainwane avatar Feb 05 '21 21:02 brainwane

I totally missed the ping on this back in June, but happened to see a notification about it yesterday. Thanks for thinking of us! The proposed PEP seems straightforward enough to implement, and it doesn't conflict with anything pypiserver is currently providing. I have some minor questions (let me know if you'd prefer we had this conversation over on discuss.python.org -- I don't have an account there currently so figured I'd ask here):

  • What would non-PyPI repositories be expected to send for last_serial, which is described as being a required field defined as "Internal PyPI serial indicating last modification"?
  • Currently pypiserver doesn't bother to parse the metadata files in the packages that are uploaded, instead using the standardized filenames to parse package names and versions. As such, populating some of the required fields in the info response would require larger changes that just adding endpoints, specifically author, author_email, license, and project_url. Given the pypiserver's goal of being able to immediately serve packages that are simply scp'ed or whatever to a server, we've avoided so far implementing a local metadata cache or anything like that. It's seeming more and more likely that we'll eventually have to do that regardless, but I'd be curious to know whether these fields are really required.

I also have some questions that are definitively outside the scope of the PEP, like how pip will handle backwards compatibility with the old simple API and whether the intent is for pip to eventually drop support for it, the answers to which will inform the degree of urgency in updating pypisever to support the new API.

Definitely glad to see this effort. It'll be great to have a clear schema that we can implement again.

mplanchard avatar Feb 07 '21 03:02 mplanchard

  • What would non-PyPI repositories be expected to send for last_serial, which is described as being a required field defined as "Internal PyPI serial indicating last modification"?

For pip and many tools this is not really used. Bandersnatch uses it to ask for packages that have changed since serial X. This should just be some sort of incrementing integer. Every upload you could just increment it. I would envision this could even just be 0 on your mirrors, unless you'd want to make your package index's bandersnatch mirror-able

  • I think it's just pased off some Postgresql ID on pypi.org (I apologize if there are other uses)
  • Currently pypiserver doesn't bother to parse the metadata files in the packages that are uploaded, instead using the standardized filenames to parse package names and versions. As such, populating some of the required fields in the info response would require larger changes that just adding endpoints, specifically author, author_email, license, and project_url. Given the pypiserver's goal of being able to immediately serve packages that are simply scp'ed or whatever to a server, we've avoided so far implementing a local metadata cache or anything like that. It's seeming more and more likely that we'll eventually have to do that regardless, but I'd be curious to know whether these fields are really required.

I think you should just start off puling the size from the file and use upload time etc. etc. to fill in as much metadata as you can and see how happy that makes your users. Othetwise, have a formal upload where all the metadata is calculated, and for your scp files you best effort it imo.

I also have some questions that are definitively outside the scope of the PEP, like how pip will handle backwards compatibility with the old simple API and whether the intent is for pip to eventually drop support for it, the answers to which will inform the degree of urgency in updating pypisever to support the new API.

I would expect once pypi.org supports this PEP, we would go make pip use it asap. I would also expect pip keep the legacy methods for a period of time and kill the non PEP code. I am not a pip maintainer so I can't make an authoritative decision here, but would be down to help do this work, if I ever get this PEP through. I am sure this would be a GitHub issue etc. and I would just say to be involved in those PRs / issues and follow along.

  • People can also use the legacy mode or old pip version until you catch up

cooperlees avatar Feb 08 '21 04:02 cooperlees

@brainwane Thanks for the shoutout to bring me here :-)

@cooperlees I'd be happy to collaborate on this PEP with you, acting as the copy-editor/reviewer to help ensure that the content is readable and understandable. I have both a desire for this PEP to be published (so that my company's tooling can benefit from it) and plenty of experience in document review and editing, so hopefully that will be a good combination.

kpfleming avatar Feb 10 '21 11:02 kpfleming

Well I feel it's ready (and has been for quite some time) to just get polished up and have any technical issues debated out.

I'll try rebase the commit and remind myself where we all are. I feel we just need approval from @ambv and @dstufft really.

@kpfleming - Happy for you to fork and PR or just go comment on the latest commit suggestions + fixes etc.

I'd love to land it and go and implement the endpoints for Warehouse ASAP.

  • Then fix bandersnatch to support as much of the static side of this API as it can

cooperlees avatar Feb 10 '21 18:02 cooperlees

I'd love to land it and go and implement the endpoints for Warehouse ASAP.

I assume it still needs to be published for review & discussion prior to approval (as far as I've seen it's not been posted to Discourse yet)? I'm very interested in this PEP but haven't paid much attention while it was in pre-PEP stage.

pfmoore avatar Feb 10 '21 20:02 pfmoore

OK, I'll put together a PR this weekend to try to get the pre-draft into a submittable state.

A question though: "go and implement the endpoints for Warehouse ASAP" implies that this PEP will result in work in Warehouse, but this PEP is supposed to document the existing API. Which way is this going to go?

kpfleming avatar Feb 11 '21 00:02 kpfleming