dspace-angular icon indicating copy to clipboard operation
dspace-angular copied to clipboard

Duplicate Detection ported from DSpace CRIS 7 Angular

Open kshepherd opened this issue 3 years ago • 4 comments

This pull request ports the Duplicate Detection framework from DSpace CRIS 7 into DSpace.

Attribution: Ported from DSpace-CRIS and extended by The Library Code with support of Technische Universität Berlin

DSpace Backend PR: https://github.com/DSpace/DSpace/pull/8415 REST Contract PR: TBD

One outstanding issue For a new submission, the default behaviour when no matches are found should be to keep the section hidden, but instead, the default message indicates that decisions are all registered. Fixed, see comment https://github.com/DSpace/dspace-angular/pull/1732#issuecomment-1211459672

To test

  • Make sure both the dspace-angular and DSpace PRs are applied
  • Enable the ‘dedup’ event consumer in dspace.cfg and add the ‘detect-duplicate’ step to a submission process in item-submission.xml
  • For best results, configure autosave on the dc.title field and any others you’re planning to test (eg. dc.identifier.doi, dc.identifier.isbn or so on). This means that whenever a change is saved to dc.title, the duplicate detection section will update with a new list of potential matches
  • Submit an item to a collection with the detect-duplicate step enabled. Enter some text in the title so that

Notes for PR reviewers

  • There is a new ‘dedup’ table in the database, this should be added by flyway
  • There is a new ‘dedup’ Solr core
  • The ‘dedup’ consumer is not in the default event consumers by default, so it’ll need to be added for practical testing
  • The fuzzy search signature is commented out of the deduplication.xml spring config by default, so uncomment that to test fuzzy matches (the default MD5 signatures for exact metadata matches still work)

Notes for 4Science devs (regarding changes from CRIS)

  • SearchDeduplication was renamed to DeduplicationPluginService as that seemed more fitting to what it actually does
  • tmpMapFilter, tmpFilter, other similar names of maps and lists were renamed to suit their purpose (eg. signatureSearchFilters)
  • A lot of unused methods, variables, classes were stripped out during the port to keep code lean and easier to review. If some were partial implementations of future work, we can bring them in with a subsequent PR

As I’ve added an authorization check (configurable, enabled by default) so that the user needs read access to each item in the potential matches list, I manually disabled it for some integration tests and leave it for others, and explicitly test it in a new integration test

I’ve added javadoc and most is pretty straightforward but please correct me if I misunderstood terms such as ‘fake', ‘reader’

Checklist

This checklist provides a reminder of what we are going to look for when reviewing your PR. You need not complete this checklist prior to creating your PR (draft PRs are always welcome). If you are unsure about an item in the checklist, don't hesitate to ask. We're here to help!

  • [ ] My PR is small in size (e.g. less than 1,000 lines of code, not including comments & integration tests). Exceptions may be made if previously agreed upon.
  • [ ] My PR passes Checkstyle validation based on the Code Style Guide.
  • [x] My PR includes Javadoc for all new (or modified) public methods and classes. It also includes Javadoc for large or complex private methods.
  • [ ] My PR passes all tests and includes new/updated Unit or Integration Tests based on the Code Testing Guide.
  • [x] If my PR modifies the REST API, I've linked to the REST Contract page (or open PR) related to this change.

kshepherd avatar Jul 20 '22 02:07 kshepherd

This pull request introduces 1 alert when merging 0a193b8e50d783f17eac77a283543090bbebf6e2 into ea67a157841c82f90b94b4895abe16afb3d26500 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

lgtm-com[bot] avatar Jul 20 '22 03:07 lgtm-com[bot]

This pull request introduces 1 alert when merging 9bbbbfab9300e7a994ddb2c96b45034018cc33aa into f6d2014bf41eac79df39eb11be29241d21112685 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

lgtm-com[bot] avatar Aug 11 '22 01:08 lgtm-com[bot]

I solved the display issue that I had noted in my first PR description. Describing my fix below for reviewers' attention:

There is an action + effect combo “CleanDetectDuplicateSection”. It’s only dispatched when the parseSaveResponse() submission effect is called.

I modified the DetectDuplicateService so that if the workspace item is undefined when getDuplicateMatches()1 is first called (ie. the angular component is initialised and asking for matches but the sectionData is not fully populated yet), the Clean action is dispatched.

kshepherd avatar Aug 11 '22 01:08 kshepherd

This pull request introduces 1 alert when merging aa4bbd104ed3cefe1fec16215dcd12a38f0c6084 into f6d2014bf41eac79df39eb11be29241d21112685 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

lgtm-com[bot] avatar Aug 11 '22 01:08 lgtm-com[bot]

This pull request introduces 1 alert when merging 13d17aabbf2dd8abb5f7161c77fd8dec4f0d5c28 into c876055855b1818515ffce77e7ccb5ea955cba29 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

lgtm-com[bot] avatar Oct 30 '22 23:10 lgtm-com[bot]

This pull request introduces 1 alert when merging 5ab3acd9a51ba22c84878cb59cc529b994bc7b3b into dc4b4ffe4d16daf43b6c7706058b104ed8c25beb - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

lgtm-com[bot] avatar Nov 02 '22 21:11 lgtm-com[bot]

Hi @kshepherd, Conflicts have been detected against the base branch. Please resolve these conflicts as soon as you can. Thanks!


This message is automatically generated by prince-chrismc/label-merge-conflicts-action so don't hesitate to report issues/improvements there.

github-actions[bot] avatar Dec 09 '22 17:12 github-actions[bot]

Thanks @kshepherd!

Like @benbosman on the backend, I'm not assigned as a reviewer on this PR, but I gave it a quick test. A few comments and questions:

  • It would be good to have a link (in a new window) to the full metadata page for a potential duplicate. In the list view you only have a handful of fixed metadata fields
  • The action buttons on duplicate results are stuck together. I'd add a bootstrap spacing class such as mr-1, or put them in a button group
  • There is a "Submitter decision" field when you look at the workflowitem as an editor, but it seems that if I mark something as a duplicate as the submitter, that duplicate doesn't show up for the editor, and if I mark it as not a duplicate, it shows up but still says "no decision yet".
  • What happens after I've "marked an item to merge" as an editor? Is there a way to merge those items from the UI?

artlowel avatar Jan 19 '23 14:01 artlowel

We are willing to follow this up if it is clear who and when it is being reviewed. We rebased and fixed this PR several times. We would do this one more time if there is a concrete plan how this is being reviewed. We will not rebase again and again until it is clear who and when this will be reviewed. Please ping us or contact us otherwise if you are interested in reviewing it.

pnbecker avatar Jan 26 '23 15:01 pnbecker

Hi @kshepherd, Conflicts have been detected against the base branch. Please resolve these conflicts as soon as you can. Thanks!


This message is automatically generated by prince-chrismc/label-merge-conflicts-action so don't hesitate to report issues/improvements there.

github-actions[bot] avatar Feb 02 '23 16:02 github-actions[bot]

We took a a new simpler and clean approach. Closing this in favor for #2749

pnbecker avatar Jan 18 '24 09:01 pnbecker