DSpace
DSpace copied to clipboard
[Port dspace-7_x] Introducing batching with configurable batch size for DOI operations
References
- Fixes https://github.com/DSpace/DSpace/issues/9622
- Improved implementation of initial attempt in https://github.com/DSpace/DSpace/pull/9822
- Main version of this PR: #9869
Description
Instead of a single commit for all operations, the operations can now be committed in batches.
Instructions for Reviewers
Put a couple of 1000 DOIs in one of the update statuses. Run with default code & observe throughput/memory usage. Compare with the batched approach
Checklist
- NO My PR is created against the
mainbranch of code (unless it is a backport or is fixing an issue specific to an older branch). ===> Intentionally made it against 7.x for the time being but happy to port to 8.x or 9.x - [x] My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
- [ ] My PR passes Checkstyle validation based on the Code Style Guide.
- [x] My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
- [ ] My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
- [x] My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
- N/A If my PR includes new libraries/dependencies (in any
pom.xml), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation. - N/A If my PR modifies REST API endpoints, I've opened a separate REST Contract PR related to this change.
- N/A If my PR includes new configurations, I've provided basic technical documentation in the PR itself.
- [x] If my PR fixes an issue ticket, I've linked them together.
@bram-atmire : Could you make a version of this PR against main? It's oftentimes more difficult to forward-port PRs from dspace-7_x because of stricter formatting rules on main. That's why we recommend PRs start against main and then be backported.
@tdonohue I did that just now in https://github.com/DSpace/DSpace/pull/9869
This might collide with #9835, which effectively establishes a fixed batch size of 1.
Just to mention @bram-atmire that we can confirm that this works as expected in a 7.6.1 instance. We have deployed it as we had pressing issues with a huge backlog of DOI updates and it is working as expected.
Hi @bram-atmire, Conflicts have been detected against the base branch. Please resolve these conflicts as soon as you can. Thanks!
@bram-atmire : We discussed this PR as a team in our DSpace Developers Meeting. Since this was linked up as directly related to #9835, we feel that this bug needs to be re-verified now that #9835 has been merged. It's possible that PR has fixed the bug already.
@steph-ieffam has volunteered to help check is the original bug described in #9622 is still reproducible.