Keshav Santhanam comments

Results 20 comments of


                                            Keshav Santhanam

Model stuck in Loading decompress_residuals_cpp extension

Could you re-run and set the environment variable `COLBERT_LOAD_TORCH_EXTENSION_VERBOSE=True`? It's possible that you need to erase your torch extensions cache to enable the Torch extension code to compile.

Model stuck in Loading decompress_residuals_cpp extension

Can you try removing this folder and running again? `/home/zzh/.cache/torch_extensions/py38_cu113`

Using Colbert purely as a re-ranker

The `search` function in ColBERT accepts a `pids` argument which can be used to rank only the given documents.

Is it possible to add metadata filtering logic to ColBERTv2?

Apologies for the delay in getting back to you - the existing metadata filtering feature is given by the `filter_fn` parameter in `Searcher.search` and `Searcher.search_all`. This user-defined `filter_fn` takes as...

Is it possible to add metadata filtering logic to ColBERTv2?

The only way to do this in ColBERT would be to pre-filter (https://www.pinecone.io/learn/vector-search-filtering/) the passages which meet the metadata-based filter and then treat those as the candidate passages. There's currently...

Is it possible to add metadata filtering logic to ColBERTv2?

CC @liana313

Is it possible to add metadata filtering logic to ColBERTv2?

The method I proposed would do a brute-force kNN search on the passed-in candidate passages, though if implemented correctly this method would still benefit from the relevance score approximation optimizations...

Do not do initial retrieval if pids are passed in

@VThejas want to try this out? @okhat correctly pointed out that this should reduce latency for reranking

Use DynamicInferenceCoordinator for text generation server

I think this was auto-requested? Not sure how the new CI works yet, but no explicit need for mamba review here.

No using tokenizer in request record.

> LGTM but are there any perf implications of doing the de-tokenization on the engine side rather than the client side? @kanz-nv Will this overhead go away with async scheduling...