collectionsonline icon indicating copy to clipboard operation
collectionsonline copied to clipboard

Provide list of synonyms

Open iteles opened this issue 8 years ago • 10 comments

@jamie and Emily to provide a final list of synonyms.

iteles avatar Aug 31 '16 10:08 iteles

@SimonLab what format do we want these in and has it been coded to deal with multiple synonyms pointing to one term?

I was thinking (one <-- many):

  • "actual term to search for / what is already in the index" <-- "Search terms entered by user"
  • "London and North Eastern Railway" <-- " (user searches for) LNER"
  • "Locomotive" <-- [ "train", "steam engine"]

Could supply a list as JSON or CSV (with the first value being the canonical term)

btw. this is low priority (for now) compared to the other issues, let's pick this back up once we've cleared off some of the more pressing issues / bugs

jamieu avatar Oct 06 '16 14:10 jamieu

Not a pre-launch blocker as agreed with @jamieu, particularly given the dependency on re-indexing.

iteles avatar Oct 13 '16 17:10 iteles

After more research I think we can avoid re-indexing the database as we don't use phrase_match query but just a simple multi_match. The phrase_match query was the reason why I thought we have to reindex the database with the synonyms analyzer see https://www.elastic.co/guide/en/elasticsearch/guide/current/multi-word-synonyms.html. I'm doing more test but I think we should just use a simple expansion ie just a list of synonyms "train", "steam engine, locomotive". The idea is at query time to convert the term to multiple query which match each synonyms.

If we use the one <- many we will need to reindex the database to get the common term for the query time and index time

SimonLab avatar Oct 20 '16 14:10 SimonLab

@SimonLab Will update the list of values, where in the codebase are we currently storing them?

jamieu avatar Oct 31 '16 16:10 jamieu

@jamieu I'm thinking of creating a new repo specifically for the synonyms to not add more maintenance features to the application itself. I think the list of synonyms will be stored in a json file.

SimonLab avatar Nov 01 '16 10:11 SimonLab

That sounds like a good idea.

On Tuesday, 1 November 2016, Simon [email protected] wrote:

@jamieu https://github.com/jamieu I'm thinking of creating a new repo specifically for the synonyms to not add more maintenance features to the application itself. I think the list of synonyms will be stored in a json file.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/TheScienceMuseum/collectionsonline/issues/332#issuecomment-257532964, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFk5dWMKnNkL2hFyyz6ZlJYG_Q0TptOks5q5xM7gaJpZM4JxcX1 .

jamieu avatar Nov 01 '16 10:11 jamieu

@emilyfildes we need to work out what should go in this list, assume existing analytics will help here.

jamieu avatar Nov 28 '16 11:11 jamieu

Need to chat about this prior to launch, if only to ensure we have the very basics handled ie.

  • train: locomotive
  • plane: aeroplane
  • telephone: phone, mobile
  • boat: shipping, marine

jamieu avatar Nov 28 '16 16:11 jamieu

@SimonLab don't let us forget about this one, although probably best to pick up once the index stabilised (and we have the images all processed).

jamieu avatar Dec 06 '16 15:12 jamieu

@jamieu Confirmed on 18-Mar-2019 that what is required to close this issue is additional documentation.

iteles avatar Mar 22 '19 22:03 iteles