qnabot-on-aws icon indicating copy to clipboard operation
qnabot-on-aws copied to clipboard

Add machine learning to ElasticSearch results rankings

Open bigrig2212 opened this issue 6 years ago • 4 comments

Just starting a thread for this enhancement. Add machine learning to search results ranking, so that results get better over time. Is especially important for chatbot, since first result may be your last chance to engage the user.

Will need some form of training good/bad rankings first. As logged here: https://github.com/awslabs/aws-ai-qna-bot/issues/41

As for ML methods to improve rankings, have come across this: https://github.com/o19s/elasticsearch-learning-to-rank

Eager to hear what other methods people have used/are familiar with.

bigrig2212 avatar Jan 14 '18 16:01 bigrig2212

I like this thread. We can think even larger, what if we replaced elasticsearch completely with some ML technique. I think some recommendation systems would be interesting here.

I am currently reading this paper to see if there are any novel ideas: https://arxiv.org/pdf/1611.08097.pdf

this could also be an interesting use/integration with AWS sagemaker: https://aws.amazon.com/sagemaker/

JohnCalhoun avatar Jan 16 '18 14:01 JohnCalhoun

another paper: https://arxiv.org/pdf/1704.00051.pdf

JohnCalhoun avatar Jan 16 '18 16:01 JohnCalhoun

I think the first step here would be to first integration AWS Sagemaker with QnABot. So in the Sagemaker configuration, the fulfillment lambda calls the Sagemaker endpoint instead of elasticsearch.

JohnCalhoun avatar Jan 16 '18 17:01 JohnCalhoun

Like that paper you found: https://arxiv.org/pdf/1704.00051.pdf "Reading Wikipedia to Answer Open-Domain Questions"

Seems like they use a document retrieval service (similar to ElasticSearch) to first narrow the results and then run the document reader model (the ML) over those narrowed results to pick out a relevant answer snippet.

I wouldn't replace ElasticSearch. I think it's a great/clever tie-in to Lex... and is very accessible. I do like the idea of extending it though. Ie: feeding the 5 highest scoring documents from ES into a "document comprehension/reader" model in Sagemaker.

bigrig2212 avatar Jan 16 '18 21:01 bigrig2212