Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional ``excludeMetadataBlocks`` parameter
What this PR does / why we need it:
Extension of API {id}/versions and {id}/versions/{versionId} with an optional excludeMetadataBlocks parameter,
that specifies whether the metadataBlocks should be listed in the output. It defaults to false, preserving backward
compatibility. (Note that for a dataset with a large number of versions and/or metadataBlocks having the metadata blocks
included can dramatically increase the volume of the output).
We have slow response from api/datasets/%s/versions due to the large response body. Most of the information included (metadatablocks) is not needed as we just want to display a dropdown list with all available versions.
Which issue(s) this PR closes:
Closes #10171
Suggestions on how to test this: Call the API once with the flag and once without.
Does this PR introduce a user interface change?: No
Is there a release notes update needed for this change?: Maybe, it is a new optional property.
coverage: 22.694% (-0.001%) from 22.695% when pulling bada7941139d905e9f2b86daff5908af86f1216c on johannes-darms:feat/10171-versions-smaller-response into 825ab15220800dc8053d3e6731b62f3a933b0a17 on IQSS:develop.
@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?
@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?
I'm not sure if we've experienced issues with the metadata blocks, and if we have, they may have been minor, possibly because we don't tend to add complex metadata block configurations in our test datasets.
It's reasonable to think it will improve performance, as additional queries are omitted. This is somewhat similar to what we did with the files, where we added the optional query parameter called excludeFiles.
This makes me wonder if it might be interesting to create a 'reduced information' endpoint instead of continuing to include parameters for excluding properties in the general endpoint.
Here are the docs for excludeFiles: https://guides.dataverse.org/en/6.4/api/native-api.html#get-version-of-a-dataset
@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?
I'm not sure if we've experienced issues with the metadata blocks, and if we have, they may have been minor, possibly because we don't tend to add complex metadata block configurations in our test datasets.
It's reasonable to think it will improve performance, as additional queries are omitted. This is somewhat similar to what we did with the files, where we added the optional query parameter called
excludeFiles.This makes me wonder if it might be interesting to create a 'reduced information' endpoint instead of continuing to include parameters for excluding properties in the general endpoint.
Our problem is only partly caused by the large complex metadata block, the other cause is the amount of versions (we have a dataset where an update is published every day, the file changes but the metadata is the same). So the payload of this API becomes huge and by omitting the metadata blocks we can mitigate the problem while still getting the necessary information about versions without introducing paging.
I'm not a fan of having different endpoints for more or less the same information. It is more code to maintain and more complicated for the user.
This PR is inspired by the excludeFiles feature and the code is quite similar.
@GPortas at some point we should probably test the SPA against a dataset with lots of versions. We should have datasets like this on the performance cluster.
@johannes-darms can you please merge the latest from develop? We need this anyway and it will trigger a Jenkins run, which is failing. Also, please consider the suggestions I made in my review. Thanks.
@johannes-darms can you please merge the latest from develop? We need this anyway and it will trigger a Jenkins run, which is failing. Also, please consider the suggestions I made in my review. Thanks.
Sorry for the delay. I've merged, adapted the documentation and wrote a simple test. If you need or want more tests I'm happy to write them.
@johannes-darms, sorry to do this to you again, but we've recently released version 6.5, so we need you to get the latest from Dev into this branch. Thanks for your patience.
@johannes-darms, sorry to do this to you again, but we've recently released version 6.5, so we need you to get the latest from Dev into this branch. Thanks for your patience.
No worries, merged without manual effort :)
Sadly, the Jenkins build fails and I cannot access the output. Could you forward the information @pdurbin ?
@johannes-darms Thanks again for your patience.
@johannes-darms I know this PR has already been merged but to answer your question, the latest API test run passed: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10778/8/testReport/
@johannes-darms I know this PR has already been merged but to answer your question, the latest API test run passed: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10778/8/testReport/
Thanks for sharing but I cannot access the jenkins instance.
Yes, I know, sorry, that link was a "note to self".
Related:
- #9916