dataverse icon indicating copy to clipboard operation
dataverse copied to clipboard

Extension of API `{id}/versions` and `{id}/versions/{versionId}` with an optional ``excludeMetadataBlocks`` parameter

Open johannes-darms opened this issue 1 year ago • 6 comments

What this PR does / why we need it:

Extension of API {id}/versions and {id}/versions/{versionId} with an optional excludeMetadataBlocks parameter, that specifies whether the metadataBlocks should be listed in the output. It defaults to false, preserving backward compatibility. (Note that for a dataset with a large number of versions and/or metadataBlocks having the metadata blocks included can dramatically increase the volume of the output).

We have slow response from api/datasets/%s/versions due to the large response body. Most of the information included (metadatablocks) is not needed as we just want to display a dropdown list with all available versions.

Which issue(s) this PR closes:

Closes #10171

Suggestions on how to test this: Call the API once with the flag and once without.

Does this PR introduce a user interface change?: No

Is there a release notes update needed for this change?: Maybe, it is a new optional property.

johannes-darms avatar Aug 19 '24 11:08 johannes-darms

Coverage Status

coverage: 22.694% (-0.001%) from 22.695% when pulling bada7941139d905e9f2b86daff5908af86f1216c on johannes-darms:feat/10171-versions-smaller-response into 825ab15220800dc8053d3e6731b62f3a933b0a17 on IQSS:develop.

coveralls avatar Aug 19 '24 11:08 coveralls

@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?

johannes-darms avatar Oct 11 '24 09:10 johannes-darms

@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?

I'm not sure if we've experienced issues with the metadata blocks, and if we have, they may have been minor, possibly because we don't tend to add complex metadata block configurations in our test datasets.

It's reasonable to think it will improve performance, as additional queries are omitted. This is somewhat similar to what we did with the files, where we added the optional query parameter called excludeFiles.

This makes me wonder if it might be interesting to create a 'reduced information' endpoint instead of continuing to include parameters for excluding properties in the general endpoint.

GPortas avatar Oct 11 '24 10:10 GPortas

Here are the docs for excludeFiles: https://guides.dataverse.org/en/6.4/api/native-api.html#get-version-of-a-dataset

pdurbin avatar Oct 15 '24 18:10 pdurbin

@GPortas we are experiencing some performance issues with our SPA when a user requests information about dataset versions, particularly those with many versions. Are you experiencing similar problems? We believe that reducing the payload by omitting the metadata would solve our problem. As we can load the metadata with another query if needed. What do you think?

I'm not sure if we've experienced issues with the metadata blocks, and if we have, they may have been minor, possibly because we don't tend to add complex metadata block configurations in our test datasets.

It's reasonable to think it will improve performance, as additional queries are omitted. This is somewhat similar to what we did with the files, where we added the optional query parameter called excludeFiles.

This makes me wonder if it might be interesting to create a 'reduced information' endpoint instead of continuing to include parameters for excluding properties in the general endpoint.

Our problem is only partly caused by the large complex metadata block, the other cause is the amount of versions (we have a dataset where an update is published every day, the file changes but the metadata is the same). So the payload of this API becomes huge and by omitting the metadata blocks we can mitigate the problem while still getting the necessary information about versions without introducing paging.

I'm not a fan of having different endpoints for more or less the same information. It is more code to maintain and more complicated for the user.

This PR is inspired by the excludeFiles feature and the code is quite similar.

johannes-darms avatar Oct 16 '24 07:10 johannes-darms

@GPortas at some point we should probably test the SPA against a dataset with lots of versions. We should have datasets like this on the performance cluster.

pdurbin avatar Oct 16 '24 13:10 pdurbin

@johannes-darms can you please merge the latest from develop? We need this anyway and it will trigger a Jenkins run, which is failing. Also, please consider the suggestions I made in my review. Thanks.

pdurbin avatar Dec 16 '24 14:12 pdurbin

@johannes-darms can you please merge the latest from develop? We need this anyway and it will trigger a Jenkins run, which is failing. Also, please consider the suggestions I made in my review. Thanks.

Sorry for the delay. I've merged, adapted the documentation and wrote a simple test. If you need or want more tests I'm happy to write them.

johannes-darms avatar Dec 18 '24 14:12 johannes-darms

@johannes-darms, sorry to do this to you again, but we've recently released version 6.5, so we need you to get the latest from Dev into this branch. Thanks for your patience.

sekmiller avatar Jan 03 '25 20:01 sekmiller

@johannes-darms, sorry to do this to you again, but we've recently released version 6.5, so we need you to get the latest from Dev into this branch. Thanks for your patience.

No worries, merged without manual effort :)

johannes-darms avatar Jan 07 '25 15:01 johannes-darms

Sadly, the Jenkins build fails and I cannot access the output. Could you forward the information @pdurbin ?

johannes-darms avatar Jan 08 '25 15:01 johannes-darms

@johannes-darms Thanks again for your patience.

sekmiller avatar Jan 08 '25 19:01 sekmiller

@johannes-darms I know this PR has already been merged but to answer your question, the latest API test run passed: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10778/8/testReport/

pdurbin avatar Jan 09 '25 20:01 pdurbin

@johannes-darms I know this PR has already been merged but to answer your question, the latest API test run passed: https://jenkins.dataverse.org/job/IQSS-Dataverse-Develop-PR/job/PR-10778/8/testReport/

Thanks for sharing but I cannot access the jenkins instance.

johannes-darms avatar Jan 10 '25 07:01 johannes-darms

Yes, I know, sorry, that link was a "note to self".

Related:

  • #9916

pdurbin avatar Jan 10 '25 15:01 pdurbin