hyrax icon indicating copy to clipboard operation
hyrax copied to clipboard

Race condition wtih VisibilityCopyJob and InheritPermissionsJob

Open conorom opened this issue 3 years ago • 0 comments

Descriptive summary

I feel like this has probably been obvious to others for quite a while, and maybe the bug label is a bit severe, but I've fallen on the side of using it anyway.

Everywhere InheritPermissionsJob is called, VisibilityCopyJob is also run. The one exception is in Hyrax::PermissionsController.copy(), which is 100% unreachable, dead code. That has been partially removed in #4650, but I've made #5826 just now to get rid of it altogether as its existence has really delayed me in writing this issue.

Anyway, both of these jobs begin looping over all of a Work's FileSets/members [1] [2], potentially pulling/saving the loop FileSets variables at random times relative to each other, where more than one background worker is running. And I presume using more than one background worker is the norm.

I've seen FileSets left in the wrong visibility state on fulcrum.org, and it can only be because of these jobs being run simultaneously. In quite a few cases this has rendered ebooks unreadable by the public for long periods of time. Note that the incorrect state ended up in Solr only. The actual value in Fedora was correct.

Affects HEAD and all versions.

Rationale

Jobs that are always run together should not be set up to loop over and save the same FileSets at the same time. Perhaps the jobs should be merged, perhaps one should call the other, just like CharacterizeJob kicks off CreateDerivativesJob when it is done. I understand this is a different situation, however, as neither InheritPermissionsJob nor VisibilityCopyJob rely on each other's output.

It would still work as a solution though.

side note: even if FileSet locking were set up (#5558), it makes no sense to purposefully call saving loops like this. It would only create guaranteed failed jobs over time, which admittedly is better than the result of incorrect data which goes invisible.

Expected behavior

Don't purposefully work on and save the same bunch of FileSets in two loops at the same time.

Actual behavior

With more than one background work we, essentially, purposefully work on and save the same bunch of FileSets in two loops at the same time in InheritPermissionsJob and VisibilityCopyJob.

Steps to reproduce the behavior

On a setup with multiple background workers, perform any action that calls InheritPermissionsJob and VisibilityCopyJob:

  • "Apply Changes to Contents?" screen (work edit where visibility was altered)
  • batch edits controller
  • adding a FileSet to a work in FileSetActor.

See the results here: https://github.com/samvera/hyrax/search?q=inheritpermissionsjob

If your work has quite a few FileSets (100+, say) I presume it is more likely to happen as the simultaneous loops will take more time working on the loop variables.

Related work

#5558

conorom avatar Aug 30 '22 21:08 conorom