collectionsonline
collectionsonline copied to clipboard
Provide list of synonyms
@jamie and Emily to provide a final list of synonyms.
@SimonLab what format do we want these in and has it been coded to deal with multiple synonyms pointing to one term?
I was thinking (one <-- many):
- "actual term to search for / what is already in the index" <-- "Search terms entered by user"
- "London and North Eastern Railway" <-- " (user searches for) LNER"
- "Locomotive" <-- [ "train", "steam engine"]
Could supply a list as JSON or CSV (with the first value being the canonical term)
btw. this is low priority (for now) compared to the other issues, let's pick this back up once we've cleared off some of the more pressing issues / bugs
Not a pre-launch blocker as agreed with @jamieu, particularly given the dependency on re-indexing.
After more research I think we can avoid re-indexing the database as we don't use phrase_match query but just a simple multi_match. The phrase_match query was the reason why I thought we have to reindex the database with the synonyms analyzer see https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-word-synonyms.html. I'm doing more test but I think we should just use a simple expansion ie just a list of synonyms "train", "steam engine, locomotive". The idea is at query time to convert the term to multiple query which match each synonyms.
If we use the one <- many we will need to reindex the database to get the common term for the query time and index time
@SimonLab Will update the list of values, where in the codebase are we currently storing them?
@jamieu I'm thinking of creating a new repo specifically for the synonyms to not add more maintenance features to the application itself. I think the list of synonyms will be stored in a json file.
That sounds like a good idea.
On Tuesday, 1 November 2016, Simon [email protected] wrote:
@jamieu https://github.com/jamieu I'm thinking of creating a new repo specifically for the synonyms to not add more maintenance features to the application itself. I think the list of synonyms will be stored in a json file.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TheScienceMuseum/collectionsonline/issues/332#issuecomment-257532964, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFk5dWMKnNkL2hFyyz6ZlJYG_Q0TptOks5q5xM7gaJpZM4JxcX1 .
@emilyfildes we need to work out what should go in this list, assume existing analytics will help here.
Need to chat about this prior to launch, if only to ensure we have the very basics handled ie.
- train: locomotive
- plane: aeroplane
- telephone: phone, mobile
- boat: shipping, marine
@SimonLab don't let us forget about this one, although probably best to pick up once the index stabilised (and we have the images all processed).
@jamieu Confirmed on 18-Mar-2019 that what is required to close this issue is additional documentation.