janeway icon indicating copy to clipboard operation
janeway copied to clipboard

Add preprint API endpoint, add default pagination for Django REST Framework

Open hardyoyo opened this issue 2 years ago • 1 comments

Add a read-only preprints endpoint to the existing Janeway API. This is mostly the work of @tingletech as the commits show, with a few changes from me to get this endpoint to work with the current version of Janeway (with considerable help and cheerleading from @ajrbyers). Also adds pagination for the entire API.

hardyoyo avatar Sep 27 '22 20:09 hardyoyo

Addresses most of #3010 but I think misses a few points

hardyoyo avatar Sep 27 '22 20:09 hardyoyo

Thanks @hardyoyo !

There are two major issues that I would consider blockers on the preprints API endpoint based on my current tests, as well as some suggestions for how it might be improved:

  1. Lack of ability to filter by repository: Currently the preprints endpoint displays data about all preprints in Janeway, and not just the specific repository one may be accessing. It may therefore be preferable to either a) add a repository filter, or b) add a repository endpoint which should be the main method of obtaining data about preprints in that repository. One should be able to query the repository endpoint to learn of the active public repositories, then query eg /repository/[id]/ or /repository/[short-name]/ to obtain the preprints under that repository.

  2. Release of data that probably shouldn't be public: The author section seems to pull all data from the account table, including some data that users may not be aware is being shared (and which isn't shared via the UX in some cases):

    • Email address (for both submitting and non-submitting authors);
    • Whether account is active (for all authors).

    There currently is an expectation that an API client with POST/PUT permissions would be able to write an email address to Janeway via the API, or to read an email address that it has previously provided if it has been granted such permissions. However a privacy/legal expert should review lest that be in violation of any regional policies/legislation/norms.

The second is the most critical to resolve.

There is additional metadata that I would expect to see in the preprint API endpoint. However, I don't consider those immediately blockers, so I'll include that in the feature request, then reference that comment here.

alainna avatar Oct 04 '22 19:10 alainna

This PR has been updated to address the two major issues identified above: e-mail and is_active are now hidden from author data, and the api URL for each respository limits the output to just the contents of that repository. This code is currently deployed on our dev instance, and can be viewed here:

  • https://dev.eartharxiv.org/api/preprints/
  • https://dev.ecoevorxiv.org/api/preprints/

hardyoyo avatar Oct 27 '22 18:10 hardyoyo

This is just a start, I know that the output might not be quite sufficient, but changes can be made fairly easily.

hardyoyo avatar Oct 27 '22 18:10 hardyoyo

@hardyoyo This is epic to see, great work!! There are a few things missing that we'd like to see in a "GET" version, as the primary use case is to allow a manuscript submission system to make a single GET call to obtain metadata for a specific preprint (https://dev.eartharxiv.org/api/preprints/1234 ) or series of preprints (https://dev.eartharxiv.org/api/preprints/).

Regarding metadata to add: I think the essence of what we want is already in the JATS version of the OAI feed for preprints: https://github.com/BirkbeckCTP/janeway/pull/3098

It would be good to double check whether the OAI feed also includes the custom fields a Repository may ask during submission, e.g. the "Data availability" statement that we see in EA/EER. I believe those are in repository_repositoryfield and repository_repositoryfieldanswer. Those should also be included. :)

alainna avatar Oct 27 '22 18:10 alainna

@hardyoyo This is epic to see, great work!! There are a few things missing that we'd like to see in a "GET" version, as the primary use case is to allow a manuscript submission system to make a single GET call to obtain metadata for a specific preprint (https://dev.eartharxiv.org/api/preprints/1234 ) or series of preprints (https://dev.eartharxiv.org/api/preprints/).

Regarding metadata to add: I think the essence of what we want is already in the JATS version of the OAI feed for preprints: #3098

It would be good to double check whether the OAI feed also includes the custom fields a Repository may ask during submission, e.g. the "Data availability" statement that we see in EA/EER. I believe those are in repository_repositoryfield and repository_repositoryfieldanswer. Those should also be included. :)

You can grab a single preprint already eg: https://dev.ecoevorxiv.org/api/preprints/1835/

ajrbyers avatar Oct 27 '22 18:10 ajrbyers

From a quick perusal of the OAI feed, I think mostly this API is missing a link to download files, and a publisher field (though that could be inferred from the source of the API). The OAI feed does have a "description" field, but I believe this is "abstract" rebranded.

hardyoyo avatar Oct 27 '22 21:10 hardyoyo

@hardyoyo Apologies for how long it took to test this. I think that everything is looking great. There's just one remaining thing that I would raise: the additional metadata fields which are requested at submission.

In the below example, I included text in the two optional "additional fields": "Conflict of interest statement" and "Reason for omitting associated data". They are visible in the public view, but not in the API.

The expectation is that any such field would always be visible in the API. If they were not filled in by the author (e.g. https://dev.eartharxiv.org/repository/view/2751/ ), then the value should be "null".

  • Manager view: https://dev.eartharxiv.org/repository/manager/2754/
  • Public view: https://dev.eartharxiv.org/repository/view/2754/
  • API view: https://dev.eartharxiv.org/api/preprints/2754/

alainna avatar Nov 10 '22 16:11 alainna

We're tagging RC-7 tomorrow, so you want to sneak this one in it will need to be finished up today. Otherwise we can fold it in 1.5.1 in Spring 2023.

ajrbyers avatar Nov 14 '22 12:11 ajrbyers

Ugh, sorry, only just now seeing this. I can try to focus on this tomorrow, but I have a site upgrade scheduled first thing in the morning.

On Nov 14, 2022 6:19 AM, Andy Byers @.***> wrote:

CAUTION: EXTERNAL EMAIL

We're tagging RC-7 tomorrow, so you want to sneak this one in it will need to be finished up today. Otherwise we can fold it in 1.5.1 in Spring 2023.

— Reply to this email directly, view it on GitHubhttps://github.com/BirkbeckCTP/janeway/pull/3166#issuecomment-1313598696, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEVJ4P2IYBILDNG5MZNTATWIIU3RANCNFSM6AAAAAAQXEWXEQ. You are receiving this because you were mentioned.Message ID: @.***>

hardyoyo avatar Nov 16 '22 03:11 hardyoyo

So, my main question is, does it really have to be perfect before it gets in to a release?

On Nov 15, 2022 9:42 PM, Hardy Pottinger @.***> wrote: Ugh, sorry, only just now seeing this. I can try to focus on this tomorrow, but I have a site upgrade scheduled first thing in the morning.

On Nov 14, 2022 6:19 AM, Andy Byers @.***> wrote:

CAUTION: EXTERNAL EMAIL

We're tagging RC-7 tomorrow, so you want to sneak this one in it will need to be finished up today. Otherwise we can fold it in 1.5.1 in Spring 2023.

— Reply to this email directly, view it on GitHubhttps://github.com/BirkbeckCTP/janeway/pull/3166#issuecomment-1313598696, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEVJ4P2IYBILDNG5MZNTATWIIU3RANCNFSM6AAAAAAQXEWXEQ. You are receiving this because you were mentioned.Message ID: @.***>

hardyoyo avatar Nov 16 '22 03:11 hardyoyo

Adding the additional model properties to this REST endpoint will be a journey of discovery, and may take some time (it will take this developer some time to figure out how to even add such a property/field to a dev repository). Here's a breadcrumb courtesy of @ajrbyers : https://stackoverflow.com/questions/17066074/modelserializer-using-model-property

hardyoyo avatar Nov 16 '22 15:11 hardyoyo

Hey there, the CDL team just discussed this and will make a retraction of the requirement to include custom repository fields in the API for now given the difficulty. Let's add this instead as a new Feature Request to improve the API functionality in the next phase.

alainna avatar Jan 10 '23 18:01 alainna

If someone can tidy up the conflicts I could probably add the custom fields for you for the 1.5 release.

ajrbyers avatar Jan 10 '23 18:01 ajrbyers

I'll get the conflicts done, I think probably a new ticket for the improvements is the best route, and then you can do that ticket at your convenience?

hardyoyo avatar Jan 10 '23 18:01 hardyoyo

Revising the title of this PR, since pagination was added in another PR

hardyoyo avatar Jan 23 '23 22:01 hardyoyo

@hardyoyo I've added two comments, and there is a test failing on this branch.

ajrbyers avatar Jan 26 '23 11:01 ajrbyers

test this please

ajrbyers avatar Jan 26 '23 11:01 ajrbyers

The custom fields part of the original request has been moved to a new issue #3371

hardyoyo avatar Jan 27 '23 19:01 hardyoyo

@hardyoyo is this ready for integration in 1.5.1?

ajrbyers avatar Jul 27 '23 12:07 ajrbyers

I believe it is, I don't know why GitHub persists in saying changes are requested

hardyoyo avatar Jul 27 '23 22:07 hardyoyo

erm... is this going to be merged into 1.5.1?

hardyoyo avatar Aug 30 '23 15:08 hardyoyo

erm... is this going to be merged into 1.5.1?

~Hey @hardyoyo we do want to merge it but it needs rebasing to solve a few conflicts as it is a fairly old branch by now~

Never mind, no conflicts on this one

mauromsl avatar Sep 15 '23 15:09 mauromsl