janeway icon indicating copy to clipboard operation
janeway copied to clipboard

Add preprint API endpoint, add default pagination for Django REST Framework

Open hardyoyo opened this issue 1 year ago • 1 comments

Add a read-only preprints endpoint to the existing Janeway API. This is mostly the work of @tingletech as the commits show, with a few changes from me to get this endpoint to work with the current version of Janeway (with considerable help and cheerleading from @ajrbyers). Also adds pagination for the entire API.

hardyoyo avatar Sep 27 '22 20:09 hardyoyo

Addresses most of #3010 but I think misses a few points

hardyoyo avatar Sep 27 '22 20:09 hardyoyo

Thanks @hardyoyo !

There are two major issues that I would consider blockers on the preprints API endpoint based on my current tests, as well as some suggestions for how it might be improved:

  1. Lack of ability to filter by repository: Currently the preprints endpoint displays data about all preprints in Janeway, and not just the specific repository one may be accessing. It may therefore be preferable to either a) add a repository filter, or b) add a repository endpoint which should be the main method of obtaining data about preprints in that repository. One should be able to query the repository endpoint to learn of the active public repositories, then query eg /repository/[id]/ or /repository/[short-name]/ to obtain the preprints under that repository.

  2. Release of data that probably shouldn't be public: The author section seems to pull all data from the account table, including some data that users may not be aware is being shared (and which isn't shared via the UX in some cases):

    • Email address (for both submitting and non-submitting authors);
    • Whether account is active (for all authors).

    There currently is an expectation that an API client with POST/PUT permissions would be able to write an email address to Janeway via the API, or to read an email address that it has previously provided if it has been granted such permissions. However a privacy/legal expert should review lest that be in violation of any regional policies/legislation/norms.

The second is the most critical to resolve.

There is additional metadata that I would expect to see in the preprint API endpoint. However, I don't consider those immediately blockers, so I'll include that in the feature request, then reference that comment here.

alainna avatar Oct 04 '22 19:10 alainna

This PR has been updated to address the two major issues identified above: e-mail and is_active are now hidden from author data, and the api URL for each respository limits the output to just the contents of that repository. This code is currently deployed on our dev instance, and can be viewed here:

  • https://dev.eartharxiv.org/api/preprints/
  • https://dev.ecoevorxiv.org/api/preprints/

hardyoyo avatar Oct 27 '22 18:10 hardyoyo

This is just a start, I know that the output might not be quite sufficient, but changes can be made fairly easily.

hardyoyo avatar Oct 27 '22 18:10 hardyoyo

@hardyoyo This is epic to see, great work!! There are a few things missing that we'd like to see in a "GET" version, as the primary use case is to allow a manuscript submission system to make a single GET call to obtain metadata for a specific preprint (https://dev.eartharxiv.org/api/preprints/1234 ) or series of preprints (https://dev.eartharxiv.org/api/preprints/).

Regarding metadata to add: I think the essence of what we want is already in the JATS version of the OAI feed for preprints: https://github.com/BirkbeckCTP/janeway/pull/3098

It would be good to double check whether the OAI feed also includes the custom fields a Repository may ask during submission, e.g. the "Data availability" statement that we see in EA/EER. I believe those are in repository_repositoryfield and repository_repositoryfieldanswer. Those should also be included. :)

alainna avatar Oct 27 '22 18:10 alainna

@hardyoyo This is epic to see, great work!! There are a few things missing that we'd like to see in a "GET" version, as the primary use case is to allow a manuscript submission system to make a single GET call to obtain metadata for a specific preprint (https://dev.eartharxiv.org/api/preprints/1234 ) or series of preprints (https://dev.eartharxiv.org/api/preprints/).

Regarding metadata to add: I think the essence of what we want is already in the JATS version of the OAI feed for preprints: #3098

It would be good to double check whether the OAI feed also includes the custom fields a Repository may ask during submission, e.g. the "Data availability" statement that we see in EA/EER. I believe those are in repository_repositoryfield and repository_repositoryfieldanswer. Those should also be included. :)

You can grab a single preprint already eg: https://dev.ecoevorxiv.org/api/preprints/1835/

ajrbyers avatar Oct 27 '22 18:10 ajrbyers

From a quick perusal of the OAI feed, I think mostly this API is missing a link to download files, and a publisher field (though that could be inferred from the source of the API). The OAI feed does have a "description" field, but I believe this is "abstract" rebranded.

hardyoyo avatar Oct 27 '22 21:10 hardyoyo