exhibits icon indicating copy to clipboard operation
exhibits copied to clipboard

A-Z sort in collections facet not working correctly

Open caaster opened this issue 7 years ago • 12 comments

In the following exhibit: https://exhibits.stanford.edu/data

The A-Z sort is not working properly in the collection facet: https://exhibits.stanford.edu/data/catalog?q=&search_field=searchUUIDa791febd-f7aa-43e1-97ab-26519cec01d3

caaster avatar Nov 01 '17 16:11 caaster

Another example, just to be clear it's not likely related to the exhibit itself (screenshot is after selecting the A-Z sort option):

-_the_bob_fitch_photography_archive_-_spotlight_at_stanford_search_results

ggeisler avatar Nov 01 '17 16:11 ggeisler

Ah -- collection is actually a compound field that includes a druid prefix. We'll have to consider alternative ways of indexing to support this use case.

cbeer avatar Nov 16 '17 16:11 cbeer

Re-opening this. After a few convos, there is some concern about breaking browse categories that have been created from this field already.

Another approach proposed by @cbeer is to index the collection title into a new field to be used. I believe that we would still need to keep the old field/data around to support existing browse categories, but this facet would not need to be shown to users/admins, would just need to be queryable by solr. @cbeer please feel free to fill-in any additional details that might be pertinent.

jkeck avatar Dec 19 '17 18:12 jkeck

Refer to exhibits #1037 - issue sub-task.

caaster avatar May 09 '18 17:05 caaster

In case it's helpful to wrap-up a little analysis here....

We have 727,309 documents that have the old collection_with_title field (solr query) We have 715,686 document that have the new collection_titles_ssim field (solr query)

This leaves a difference of 11,623 documents that need to be re-indexed so they have the collection_titles_ssim field (solr query / csv of IDs).

It might be possible that once we re-index all 11,623 documents that we could configure the existing collection_with_title facet field to use the collection_titles_ssim as the field name (and retain collection_with_title as the key) so that existing facet configurations/saved searches/etc would continue to work (but will need to validate that).

jkeck avatar Mar 12 '21 18:03 jkeck

@jkeck the csv of IDs link doesn't seem to go to the csv file....

caaster avatar Mar 12 '21 18:03 caaster

I've obfuscated our solr host URL in all of the links, but if you replace that with our real production server you can get the CSV. Happy to ping you that privately on slack if you would like.

jkeck avatar Mar 12 '21 19:03 jkeck

If we agree that reindexing the affected exhibits is a good course of action, I can reach out to creators to obtain permission to do so, although this will take some time/coordination. So let's discuss this and decide if we have the information we need to be able to make the decision. With help from Jessie, here is the list.

image

caaster avatar Mar 12 '21 20:03 caaster

This is just reported again (4/15/2022), specifically for the data exhibit:

Screen Shot 2022-04-15 at 9 12 01 AM

anarchivist avatar Apr 15 '22 16:04 anarchivist

  • [x] Need a new list of impacted collections. (Do this now.)
  • [ ] Cathy would need time to reach out to exhibit creators to get permission to re-index.
  • [ ] Then we could re-index the collections
  • [ ] Switch the app's collection_with_title configuration to use the new collection_titles_ssim field. We can only do this if we reindex all exhibits. If we don't or can't reindex all exhibits we'll need to treat this as a new field and change the config for each exhibit to use the new field for collection faceting.
  • [ ] If we change collection_with_title to use the new collection_titles_ssim field we also need to: Do something (in a search builder?) so that existing browse category searches that include the druid prefixed to the title continue to work with the new field (presumably by stripping the druid from the query before sending it to Solr). I guess we could also try to update the configured searches/browse categories.

corylown avatar Jul 25 '24 16:07 corylown

@caaster here's a current report of exhibits that would need to be re-indexed in order for us to fix the A-Z sort of the Collection facet:

Public exhibits:

data
exemplars
harrison
stanford-senate
oral-history
preventing-genocide
women-art-revolution
stanford-band
su-photos
hanna-house-collection
menuez
gatt
ua-maps-drawings
women
kzsu
community
tel-aviv
news-service
time-travel-japan
views-portraying-place-space
stanford-stories
lanciani
maps-of-africa
stanford-pubs
stanfords
ruderman
renaissance-exploration
mss
rare-music
chinese-ngos
heckrotte
activism
queer
cs
feigenbaum
ai

Exhibits that require auth:

h2-test
digital-bookplates
memorialchurch
gordon-maps

deleted? exhibits (can ignore, but including for completeness)

alcoa
bach-test
my-favorite-cats
kolb

corylown avatar Jul 25 '24 19:07 corylown

Thank you @corylown. I will craft an email to send to all exhibit creators, send it, and nudge everyone until they all reply. Stay tuned! This will take time, and as I said good to get started now.

caaster avatar Jul 25 '24 21:07 caaster

@corylown I have now reached all exhibit curators. I have a small list of exhibits to exclude from reindexing; here are the slugs: ai ruderman gordon-maps maps-of-africa views-portraying-place-space

All other exhibits listed above can be reindexed. Looking forward to this bug being fixed!

caaster avatar Nov 15 '24 21:11 caaster

So the list of exhibits to be reindexed is:

  • [x] data
  • [x] exemplars
  • [x] harrison
  • [x] stanford-senate
  • [x] oral-history
  • [x] preventing-genocide
  • [x] women-art-revolution
  • [x] stanford-band
  • [x] su-photos
  • [x] hanna-house-collection
  • [x] menuez
  • [x] gatt
  • [x] ua-maps-drawings
  • [x] women
  • [x] kzsu
  • [x] community
  • [x] tel-aviv
  • [x] news-service
  • [x] time-travel-japan
  • [x] stanford-stories
  • [x] lanciani
  • [x] stanford-pubs
  • [x] stanfords
  • [x] renaissance-exploration
  • [x] mss
  • [x] rare-music
  • [x] chinese-ngos
  • [x] heckrotte
  • [x] activism
  • [x] queer
  • [x] cs
  • [x] feigenbaum

corylown avatar Nov 26 '24 16:11 corylown

This A-Z sort issue should now be resolved for most exhibits. There are three exhibits that still use the old collection name field because they cannot be reindexed:

views-portraying-place-space chinese-ngos gordon-maps

The old collection name field now shows up as Collection (deprecated). Old browse categories built using the old field should still work.

corylown avatar Dec 18 '24 14:12 corylown