hydra-works icon indicating copy to clipboard operation
hydra-works copied to clipboard

Deleting Works not removing list_source and indirect_containers from Solr

Open roryegerton opened this issue 9 years ago • 5 comments

We currently have the below situation where file_sets are members of works

bw = BibliographicWork.new
bw.save

fs = BibliographicFileSet.new
fs.save

bw.ordered_members << fs
bw.save

This creates the following objects/documents in Fedora and Solr: 1 - ActiveFedora::Aggregation::ListSource 1 - ActiveFedora::Aggregation::Proxy 1 - ActiveFedora::IndirectContainer 1 - BibliographicWork 1 - BibliographicFileSet

When I destroy the FileSet, it deletes itself and its proxy from Fedora and Solr

fs.destroy

This leaves the following objects/documents in Fedora and Solr:

1 - ActiveFedora::Aggregation::ListSource 1 - ActiveFedora::IndirectContainer 1 - BibliographicWork

So far so good, this is all as expected.

However when I destroy the the Work, I am still left with the IndirectContainer and the ListSource in Solr

bw.destroy

Leaves solr documents for: 1 - ActiveFedora::Aggregation::ListSource 1 - ActiveFedora::IndirectContainer

In fedora these aren't accessible as the parent for both these objects is a tombstone

As a work around for now, to ensure that everything is deleted I can carry out the following commands to delete the indirect_container and list_source before deletion:

bw.list_source.destroy
indirect_container = ActiveFedora::IndirectContainer.where(id: "#{bw.id}/members").first #we know there is only one, otherwise we'd loop through each
indirect_container.destroy
bw.destroy

I wonder should this be done in the PCDM models instead?

roryegerton avatar Feb 25 '16 17:02 roryegerton

So the problem is that the solr document is left behind, yes? Because that indirect container should be very deleted from Fedora.

tpendragon avatar Feb 25 '16 17:02 tpendragon

Yes when I delete the Work. There is an IndirectContainer and a Aggregation::ListSource document still in solr

roryegerton avatar Feb 25 '16 17:02 roryegerton

This is a much more generic problem than Hydra:PCDM. Effectively this is a failure in the sync of Fedora -> Solr. When I delete a root node in Fedora, it deletes all its contained resources as well. However, we don't do anything in that regard in ActiveFedora (with good reason - that'd be slooooww), so the solr document stays in place. So how do we deal with it? DO we deal with it?

Is this something we should be eating up the event stream from Fedora to do?

tpendragon avatar Feb 25 '16 17:02 tpendragon

Is there any use in them going into Solr in the first place? Do these have to be AF::Base objects?

jcoyne avatar Feb 25 '16 19:02 jcoyne

The list source being in solr is used for a query, I think, but maybe it doesn't have to be?

tpendragon avatar Feb 29 '16 18:02 tpendragon