Jimmy Lin comments

Results 251 comments of


                                            Jimmy Lin

Strategic vs. tactical

Tactical: + What's the next experiment I should run? + Is _A_ or _B_ more promising? + When should I arXiv this? Strategic: + What do I want to be...

Add ability to compute trec_eval metrics directly on in-memory data structures

Hi @joaopalotti - Looking at your code, `TrecEval` takes a `TrecRun`, which reads an external file into a DF. So if we generate a DF that corresponds to the same...

Add ability to compute trec_eval metrics directly on in-memory data structures

I'm happy to provide all the runs in Anserini for your unit tests!

evaluate_anserini_bm25.py retrieves 1000 documents for each query no matter which k I set in payload

@thakur-nandan File an issue to redirect the BM25 baselines over to Pyserini? Will save you from having to answer such queries again in the future...

Use `KoreanAnalyzer` for Korean language (ko)

Hi @sudokim thanks for the PR! Do you have any idea if effectiveness improves as a result of switching the analyzer? E.g., on MIRACL or Mr.Tydi?

Use `KoreanAnalyzer` for Korean language (ko)

Great! Do you happen to have MRR scores? And also results on MIRACL? (Which will give us nDCG scores.)

Use `KoreanAnalyzer` for Korean language (ko)

Awesome, that's great! We'll get this merged in... but it triggers a long dependency chain... we need to fix the regression... we also need to fix the pre-built indexes for...

Errors with openai-ada2-int8 regressions: GCLocker errors

@tteofili tells me this is the fix: https://github.com/apache/lucene/pull/13090

Errors with openai-ada2-int8 regressions: GCLocker errors

Interesting, @tteofili - I'm playing with JDK 21. (On master, we are still on JDK 11.) Note, this is still on Lucene 9.9.1: with `-Xmx 31G`, got `java.lang.OutOfMemoryError: Java heap...

Errors with openai-ada2-int8 regressions: GCLocker errors

Closed by #2448