janeway
janeway copied to clipboard
Add preprint API endpoint, add default pagination for Django REST Framework
Add a read-only preprints endpoint to the existing Janeway API. This is mostly the work of @tingletech as the commits show, with a few changes from me to get this endpoint to work with the current version of Janeway (with considerable help and cheerleading from @ajrbyers). Also adds pagination for the entire API.
Addresses most of #3010 but I think misses a few points
Thanks @hardyoyo !
There are two major issues that I would consider blockers on the preprints API endpoint based on my current tests, as well as some suggestions for how it might be improved:
-
Lack of ability to filter by repository: Currently the preprints endpoint displays data about all preprints in Janeway, and not just the specific repository one may be accessing. It may therefore be preferable to either a) add a repository filter, or b) add a repository endpoint which should be the main method of obtaining data about preprints in that repository. One should be able to query the repository endpoint to learn of the active public repositories, then query eg /repository/[id]/ or /repository/[short-name]/ to obtain the preprints under that repository.
-
Release of data that probably shouldn't be public: The author section seems to pull all data from the account table, including some data that users may not be aware is being shared (and which isn't shared via the UX in some cases):
- Email address (for both submitting and non-submitting authors);
- Whether account is active (for all authors).
There currently is an expectation that an API client with POST/PUT permissions would be able to write an email address to Janeway via the API, or to read an email address that it has previously provided if it has been granted such permissions. However a privacy/legal expert should review lest that be in violation of any regional policies/legislation/norms.
The second is the most critical to resolve.
There is additional metadata that I would expect to see in the preprint API endpoint. However, I don't consider those immediately blockers, so I'll include that in the feature request, then reference that comment here.
This PR has been updated to address the two major issues identified above: e-mail and is_active are now hidden from author data, and the api URL for each respository limits the output to just the contents of that repository. This code is currently deployed on our dev instance, and can be viewed here:
- https://dev.eartharxiv.org/api/preprints/
- https://dev.ecoevorxiv.org/api/preprints/
This is just a start, I know that the output might not be quite sufficient, but changes can be made fairly easily.
@hardyoyo This is epic to see, great work!! There are a few things missing that we'd like to see in a "GET" version, as the primary use case is to allow a manuscript submission system to make a single GET call to obtain metadata for a specific preprint (https://dev.eartharxiv.org/api/preprints/1234
) or series of preprints (https://dev.eartharxiv.org/api/preprints/
).
Regarding metadata to add: I think the essence of what we want is already in the JATS version of the OAI feed for preprints: https://github.com/BirkbeckCTP/janeway/pull/3098
It would be good to double check whether the OAI feed also includes the custom fields a Repository may ask during submission, e.g. the "Data availability" statement that we see in EA/EER. I believe those are in repository_repositoryfield
and repository_repositoryfieldanswer
. Those should also be included. :)
@hardyoyo This is epic to see, great work!! There are a few things missing that we'd like to see in a "GET" version, as the primary use case is to allow a manuscript submission system to make a single GET call to obtain metadata for a specific preprint (
https://dev.eartharxiv.org/api/preprints/1234
) or series of preprints (https://dev.eartharxiv.org/api/preprints/
).Regarding metadata to add: I think the essence of what we want is already in the JATS version of the OAI feed for preprints: #3098
It would be good to double check whether the OAI feed also includes the custom fields a Repository may ask during submission, e.g. the "Data availability" statement that we see in EA/EER. I believe those are in
repository_repositoryfield
andrepository_repositoryfieldanswer
. Those should also be included. :)
You can grab a single preprint already eg: https://dev.ecoevorxiv.org/api/preprints/1835/
From a quick perusal of the OAI feed, I think mostly this API is missing a link to download files, and a publisher field (though that could be inferred from the source of the API). The OAI feed does have a "description" field, but I believe this is "abstract" rebranded.
@hardyoyo Apologies for how long it took to test this. I think that everything is looking great. There's just one remaining thing that I would raise: the additional metadata fields which are requested at submission.
In the below example, I included text in the two optional "additional fields": "Conflict of interest statement" and "Reason for omitting associated data". They are visible in the public view, but not in the API.
The expectation is that any such field would always be visible in the API. If they were not filled in by the author (e.g. https://dev.eartharxiv.org/repository/view/2751/ ), then the value should be "null".
- Manager view: https://dev.eartharxiv.org/repository/manager/2754/
- Public view: https://dev.eartharxiv.org/repository/view/2754/
- API view: https://dev.eartharxiv.org/api/preprints/2754/
We're tagging RC-7 tomorrow, so you want to sneak this one in it will need to be finished up today. Otherwise we can fold it in 1.5.1 in Spring 2023.
Ugh, sorry, only just now seeing this. I can try to focus on this tomorrow, but I have a site upgrade scheduled first thing in the morning.
On Nov 14, 2022 6:19 AM, Andy Byers @.***> wrote:
CAUTION: EXTERNAL EMAIL
We're tagging RC-7 tomorrow, so you want to sneak this one in it will need to be finished up today. Otherwise we can fold it in 1.5.1 in Spring 2023.
— Reply to this email directly, view it on GitHubhttps://github.com/BirkbeckCTP/janeway/pull/3166#issuecomment-1313598696, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEVJ4P2IYBILDNG5MZNTATWIIU3RANCNFSM6AAAAAAQXEWXEQ. You are receiving this because you were mentioned.Message ID: @.***>
So, my main question is, does it really have to be perfect before it gets in to a release?
On Nov 15, 2022 9:42 PM, Hardy Pottinger @.***> wrote: Ugh, sorry, only just now seeing this. I can try to focus on this tomorrow, but I have a site upgrade scheduled first thing in the morning.
On Nov 14, 2022 6:19 AM, Andy Byers @.***> wrote:
CAUTION: EXTERNAL EMAIL
We're tagging RC-7 tomorrow, so you want to sneak this one in it will need to be finished up today. Otherwise we can fold it in 1.5.1 in Spring 2023.
— Reply to this email directly, view it on GitHubhttps://github.com/BirkbeckCTP/janeway/pull/3166#issuecomment-1313598696, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEVJ4P2IYBILDNG5MZNTATWIIU3RANCNFSM6AAAAAAQXEWXEQ. You are receiving this because you were mentioned.Message ID: @.***>
Adding the additional model properties to this REST endpoint will be a journey of discovery, and may take some time (it will take this developer some time to figure out how to even add such a property/field to a dev repository). Here's a breadcrumb courtesy of @ajrbyers : https://stackoverflow.com/questions/17066074/modelserializer-using-model-property
Hey there, the CDL team just discussed this and will make a retraction of the requirement to include custom repository fields in the API for now given the difficulty. Let's add this instead as a new Feature Request to improve the API functionality in the next phase.
If someone can tidy up the conflicts I could probably add the custom fields for you for the 1.5 release.
I'll get the conflicts done, I think probably a new ticket for the improvements is the best route, and then you can do that ticket at your convenience?
Revising the title of this PR, since pagination was added in another PR
@hardyoyo I've added two comments, and there is a test failing on this branch.
test this please
The custom fields part of the original request has been moved to a new issue #3371
@hardyoyo is this ready for integration in 1.5.1?
I believe it is, I don't know why GitHub persists in saying changes are requested
erm... is this going to be merged into 1.5.1?
erm... is this going to be merged into 1.5.1?
~Hey @hardyoyo we do want to merge it but it needs rebasing to solve a few conflicts as it is a fairly old branch by now~
Never mind, no conflicts on this one