node-question-answering icon indicating copy to clipboard operation
node-question-answering copied to clipboard

QAing on a larger corpus?

Open bigrig2212 opened this issue 4 years ago • 1 comments

Hi. Awesome project. So fun. Wondering what is the technique to ask a question to a larger corpus? More like hundreds of documents versus a short text snippet as in the sample code. As it seems to take longer and get less accurate the more text I supply - i'm wondering if there's another technique to work with a larger corpus? Filter first using TF-IDF and then run this QA only on the returned documents?

Thx.

bigrig2212 avatar May 09 '20 04:05 bigrig2212

You could divide your corpus into sections of "a reasonable size" (whatever that size is), then run QnA on all sections, perhaps in parallel, then sort all the answers by the score returned by the model and grab the 10 answers with the highest score.

You will potentially end up running QnA on lots of irrelevant text.

Is there a way your corpus is structured so that you can filter it down?

You could also label sections of your corpus into a set of categories, then build a corpus of questions label by the same categories, then run classification on the question to get the category, filter the corpus by that category and then run QnA.

That would require a labelled dataset for both corpus sections and a good corpus of questions.

martinnormark avatar May 26 '20 21:05 martinnormark