hydra-head Permissions don't work after a complete reindex

reindex_everything must be invoked twice to get permissions into solr

Feb 21 '17 17:02 hackartisan

This may be unfixable, because it's doing a solr query to get the permissions objects (first indexing pass) in order to write the permissions onto the actual object (second pass)

Feb 21 '17 17:02 jcoyne

Is this because there's no way to query the repo by type, and the permissions objects point to the objects they govern?

Feb 21 '17 17:02 barmintor

So the second time through it doesn't actually need to index permissions objects. The indexing job could add a step that queries the index itself instead of the repo, and only updates index on non-permissions objects. Does that sound right?

Feb 23 '17 16:02 hackartisan

@hackmastera only if you use the default indexers. You might have an indexer that uses a value out of the Fedora model to conditionally create a solr document. Thus, you are unable to derive the next solr document just from the last solr document.

Feb 23 '17 18:02 jcoyne

@jcoyne interesting; if you know of or could think of an example I'd be helpful. A conditional that would not add the object on the first pass but would add the object on the second pass?

I still think this could be a useful way to do it to keep from running the entire thing twice; you'd run it more like 1.5 times. But I guess if you had a case like the above, you'd be in an even worse situation than before because you'd still have to run the whole job again.

Feb 23 '17 21:02 hackartisan

There's an ordering problem involved here too. Lets say we have these models:

class Library < ActiveFedora::Base
  has_many :books
end

class Book < ActiveFedora::Base
  belongs_to :library
  property :title
end

Now lets say the to_solr method for Library wants all the book titles:

 def to_solr(doc)
    super(doc)
    doc['book_titles'] = books.map(&:title)   
 end

This works fine, so long as we can guarantee the books are indexed before the Library. If the books are indexed after the Library, the library will have an incomplete set of titles in book_titles. Thus we need a two pass index. Once to build the relationships and a second time to do any of the other indexing.

Feb 23 '17 21:02 jcoyne

@jcoyne good point.

Feb 23 '17 21:02 hackartisan

@jcoyne but my proposed second pass would catch that, since it would reindex everything that isn't a permissions object.

Feb 23 '17 21:02 hackartisan

@hackmastera I added a solution to our sufia6 instance of ScholarSphere a while back for this very issue. Not sure if I am in love with it, but here is is: https://github.com/psu-stewardship/scholarsphere/blob/master/app/jobs/resolrize_job.rb

I used the id length to determine which objects were permission items (since those have very long ids) and then indexed those first.

It does not take into account @jcoyne's member issue though.

Feb 24 '17 12:02 carolyncole

@cam156 thanks that's a good idea and in my case I think it could work. I do have nested attribute objects with the longer IDs, but it should be fine to index those before the works since the relationship is stored on the work side. I'll try this out.

Feb 24 '17 15:02 hackartisan