Jimmy Lin
Jimmy Lin
This is a good starting point: https://github.com/castorini/anserini/blob/master/src/main/resources/fuse_regression/beir-v1.0.0-robust04.flat.bm25.fuse.bge-base-en-v1.5.bge-flat-onnx.yaml But I have suggestions for improvements. Instead of: ``` runs: - runs/run.beir-v1.0.0-robust04.flat.bm25.topics.beir-v1.0.0-robust04.test.txt - runs/run.beir-v1.0.0-robust04.bge-base-en-v1.5.bge-flat-onnx.topics.beir-v1.0.0-robust04.test.txt ``` Maybe we can do something like: ``` runs:...
Surfacing https://github.com/castorini/anserini/pull/2582#issuecomment-2344407889
The [`run_regression.py`](https://github.com/castorini/anserini/blob/master/src/main/python/run_regression.py) script generates regressions commands, which might be slightly different from the ones generated in the docs. We should align these two different code paths.
We use ThreadPools in many places throughout the codebase, e.g., https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/HnswDenseSearcher.java Best practices starting Java 19 I believe is to use try-with-resources blocks, e.g., https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/concurrent/ExecutorService.html ``` try (ExecutorService e =...
We have two versions of `QueryEncoder` and two versions of `AutoQueryEncoder`. One set of classes is in `pyserini.search.faiss`, the other set is in `pyserini.encode`. + `QueryEncoder` in [`pyserini/search/faiss/_searcher.py`](https://github.com/castorini/pyserini/blob/67d07a0554c67b0728f9226fb273b0bc0a7306bd/pyserini/search/faiss/_searcher.py#L50) + `QueryEncoder`...
At Pyserini commit [`e68d54`](https://github.com/castorini/pyserini/commit/e68d544c148a530407a91e8df7632128628ece0a) (2024/10/12), I'm running: ``` nohup python -m pyserini.2cr.atomic --all --directory runs/ --display-commands >& logs/log.atomic & ``` Getting an error: ``` condition base-t2i: - Model: ViT-L-14.laion2b_s32b_b82k MRR@10...
We'd like to create "core" and "optional" dependencies, with `faiss`, `nmslib`, `lightgbm` and a few more moved into optional. This is because installing these dependencies can be quite trick, so...
Let's start with MS MARCO v1 since it's small and manageable - and then work our way up MS MARCO v2.1.
We should add blurb in documentation: how to clean up partial downloads of prebuilt index
This question has been asked a few times already, most recently https://github.com/castorini/pyserini/discussions/1939 and https://github.com/castorini/pyserini/discussions/1593 We should clearly document this.