Jimmy Lin

Results 211 issues of Jimmy Lin

Hi there, thanks for sharing your QA resource! https://github.com/deepset-ai/COVID-QA/tree/master/data/question-answering I was wondering if you have a write-up of the annotation methodology? For example, how were the documents selected, how were...

question

Hi there, thanks for providing this nice resource! Looking at your paper, I think your BM25 baselines are a bit low? You report 0.218 nDCG@10 on MS MARCO, if I'm...

https://github.com/castorini/anserini/blob/master/docs/regressions-hc4-v1.0-ru.md#effectiveness RM3 only gets 0.0821 Possibly a bug? We should look into it...

Look at #1875 Based on my preliminary tests, Lucene 9 has better analyzers for Russian. We should try HC4 and see if it makes a difference. @ToluClassics can you try...

We merged this without doing a performance analysis: #1857 However, the multithreaded impl turns out to be actually slower... This was originally reported here in Pyserini https://github.com/castorini/pyserini/pull/1178 but @HAKSOAT confirmed...

To index JsonVectorCollection sparse vectors, we currently use the "fake words" trick - just duplicate the word _X_ times, where _X_ is the score. This might be a better solution:...

We should develop a generic mechanism to store and use Waterloo spam scores, PageRank, HITS, and other static priors. @iorixxx Do you have some code to contribute along these lines?

Currently, we're extending MultifieldSourceDocument, which probably shouldn't be the case.

We don't have SDM regression tested. We should fix this.