Jimmy Lin issues

Results 211 issues of


                                            Jimmy Lin

Improvements to fusion regression yaml

This is a good starting point: https://github.com/castorini/anserini/blob/master/src/main/resources/fuse_regression/beir-v1.0.0-robust04.flat.bm25.fuse.bge-base-en-v1.5.bge-flat-onnx.yaml But I have suggestions for improvements. Instead of: ``` runs: - runs/run.beir-v1.0.0-robust04.flat.bm25.topics.beir-v1.0.0-robust04.test.txt - runs/run.beir-v1.0.0-robust04.bge-base-en-v1.5.bge-flat-onnx.topics.beir-v1.0.0-robust04.test.txt ``` Maybe we can do something like: ``` runs:...

Try parquet-floor

Surfacing https://github.com/castorini/anserini/pull/2582#issuecomment-2344407889

Align commands from `run_regression.py` and auto-generated docs

The [`run_regression.py`](https://github.com/castorini/anserini/blob/master/src/main/python/run_regression.py) script generates regressions commands, which might be slightly different from the ones generated in the docs. We should align these two different code paths.

Refactor ThreadPoolExecutor to use try-with-resources

We use ThreadPools in many places throughout the codebase, e.g., https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/search/HnswDenseSearcher.java Best practices starting Java 19 I believe is to use try-with-resources blocks, e.g., https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/concurrent/ExecutorService.html ``` try (ExecutorService e =...

WTF? Two versions of QueryEncoder and two versions of AutoQueryEncoder

We have two versions of `QueryEncoder` and two versions of `AutoQueryEncoder`. One set of classes is in `pyserini.search.faiss`, the other set is in `pyserini.encode`. + `QueryEncoder` in [`pyserini/search/faiss/_searcher.py`](https://github.com/castorini/pyserini/blob/67d07a0554c67b0728f9226fb273b0bc0a7306bd/pyserini/search/faiss/_searcher.py#L50) + `QueryEncoder`...

Verifying AToMiC regressions

At Pyserini commit [`e68d54`](https://github.com/castorini/pyserini/commit/e68d544c148a530407a91e8df7632128628ece0a) (2024/10/12), I'm running: ``` nohup python -m pyserini.2cr.atomic --all --directory runs/ --display-commands >& logs/log.atomic & ``` Getting an error: ``` condition base-t2i: - Model: ViT-L-14.laion2b_s32b_b82k MRR@10...

Jimmy Lin

Improvements to fusion regression yaml

Try parquet-floor

Align commands from `run_regression.py` and auto-generated docs

Refactor ThreadPoolExecutor to use try-with-resources

WTF? Two versions of QueryEncoder and two versions of AutoQueryEncoder

Verifying AToMiC regressions

Create "core" and "optional" dependencies

Integrate Arctic embeddings into pyserini

Update documentation: how to clean up partial downloads of prebuilt index

Update docs about fetching doc text given docid for dense indexes