exhibits
exhibits copied to clipboard
A-Z sort in collections facet not working correctly
In the following exhibit: https://exhibits.stanford.edu/data
The A-Z sort is not working properly in the collection facet: https://exhibits.stanford.edu/data/catalog?q=&search_field=searchUUIDa791febd-f7aa-43e1-97ab-26519cec01d3
Another example, just to be clear it's not likely related to the exhibit itself (screenshot is after selecting the A-Z sort option):

Ah -- collection is actually a compound field that includes a druid prefix. We'll have to consider alternative ways of indexing to support this use case.
Re-opening this. After a few convos, there is some concern about breaking browse categories that have been created from this field already.
Another approach proposed by @cbeer is to index the collection title into a new field to be used. I believe that we would still need to keep the old field/data around to support existing browse categories, but this facet would not need to be shown to users/admins, would just need to be queryable by solr. @cbeer please feel free to fill-in any additional details that might be pertinent.
Refer to exhibits #1037 - issue sub-task.
In case it's helpful to wrap-up a little analysis here....
We have 727,309 documents that have the old collection_with_title
field (solr query)
We have 715,686 document that have the new collection_titles_ssim
field (solr query)
This leaves a difference of 11,623 documents that need to be re-indexed so they have the collection_titles_ssim
field (solr query / csv of IDs).
It might be possible that once we re-index all 11,623 documents that we could configure the existing collection_with_title
facet field to use the collection_titles_ssim
as the field name (and retain collection_with_title
as the key) so that existing facet configurations/saved searches/etc would continue to work (but will need to validate that).
@jkeck the csv of IDs link doesn't seem to go to the csv file....
I've obfuscated our solr host URL in all of the links, but if you replace that with our real production server you can get the CSV. Happy to ping you that privately on slack if you would like.
If we agree that reindexing the affected exhibits is a good course of action, I can reach out to creators to obtain permission to do so, although this will take some time/coordination. So let's discuss this and decide if we have the information we need to be able to make the decision. With help from Jessie, here is the list.
This is just reported again (4/15/2022), specifically for the data
exhibit:

- [x] Need a new list of impacted collections. (Do this now.)
- [ ] Cathy would need time to reach out to exhibit creators to get permission to re-index.
- [ ] Then we could re-index the collections
- [ ] Switch the app's collection_with_title configuration to use the new collection_titles_ssim field. We can only do this if we reindex all exhibits. If we don't or can't reindex all exhibits we'll need to treat this as a new field and change the config for each exhibit to use the new field for collection faceting.
- [ ] If we change collection_with_title to use the new collection_titles_ssim field we also need to: Do something (in a search builder?) so that existing browse category searches that include the druid prefixed to the title continue to work with the new field (presumably by stripping the druid from the query before sending it to Solr). I guess we could also try to update the configured searches/browse categories.
@caaster here's a current report of exhibits that would need to be re-indexed in order for us to fix the A-Z sort of the Collection facet:
Public exhibits:
data
exemplars
harrison
stanford-senate
oral-history
preventing-genocide
women-art-revolution
stanford-band
su-photos
hanna-house-collection
menuez
gatt
ua-maps-drawings
women
kzsu
community
tel-aviv
news-service
time-travel-japan
views-portraying-place-space
stanford-stories
lanciani
maps-of-africa
stanford-pubs
stanfords
ruderman
renaissance-exploration
mss
rare-music
chinese-ngos
heckrotte
activism
queer
cs
feigenbaum
ai
Exhibits that require auth:
h2-test
digital-bookplates
memorialchurch
gordon-maps
deleted? exhibits (can ignore, but including for completeness)
alcoa
bach-test
my-favorite-cats
kolb
Thank you @corylown. I will craft an email to send to all exhibit creators, send it, and nudge everyone until they all reply. Stay tuned! This will take time, and as I said good to get started now.
@corylown I have now reached all exhibit curators. I have a small list of exhibits to exclude from reindexing; here are the slugs: ai ruderman gordon-maps maps-of-africa views-portraying-place-space
All other exhibits listed above can be reindexed. Looking forward to this bug being fixed!
So the list of exhibits to be reindexed is:
- [x] data
- [x] exemplars
- [x] harrison
- [x] stanford-senate
- [x] oral-history
- [x] preventing-genocide
- [x] women-art-revolution
- [x] stanford-band
- [x] su-photos
- [x] hanna-house-collection
- [x] menuez
- [x] gatt
- [x] ua-maps-drawings
- [x] women
- [x] kzsu
- [x] community
- [x] tel-aviv
- [x] news-service
- [x] time-travel-japan
- [x] stanford-stories
- [x] lanciani
- [x] stanford-pubs
- [x] stanfords
- [x] renaissance-exploration
- [x] mss
- [x] rare-music
- [x] chinese-ngos
- [x] heckrotte
- [x] activism
- [x] queer
- [x] cs
- [x] feigenbaum
This A-Z sort issue should now be resolved for most exhibits. There are three exhibits that still use the old collection name field because they cannot be reindexed:
views-portraying-place-space chinese-ngos gordon-maps
The old collection name field now shows up as Collection (deprecated)
. Old browse categories built using the old field should still work.