Dzung Bui
Dzung Bui
I added the change log, increased the FSTPostingsFormat version (isn't entirely related to this PR, but it seems the naming convention is outdated). The change for FSTCompiler can be merged...
I'm not sure why FSTPostingsFormat is different from the rest, that it write both the metadata and data to the same file. I think writing to separate files would be...
Thanks @mikemccand for the clarification! Do you think we should still make this change? One benefit is that it can be used for reference. Otherwise I'll close this PR
Besides the optimization of manipulating the internal byte[] directly, I think this is good to go.
I think it's good to go, but I don't have merge permission. Mike should be able to help you, otherwise you can try notify the dev mailing list as suggested...
The build fails with `The import org.apache.lucene.codecs.lucene100 cannot be resolved`, I thought this is already in mainline. Will check. Edit: It has been moved to [backward codecs](https://github.com/apache/lucene/blob/main/lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene100/Lucene100Codec.java#L56). Will use something...
I have a preliminary benchmark here (top-k=100, fanout=0) using Cohere 768 dataset.  Anyhow I can see these 2 things that should be addressed: - If we access the full-sized...
Edit: My previous benchmark was wrong because the vectors are corrupted First benchmark show the recall improvement for each oversample with reranking. It now aligns with what was produced in...
Also this is the luceneutil branch I used for benchmarking: https://github.com/dungba88/luceneutil/tree/dungba88/two-phase-search, which incorporates the test for BQ implementation by @benwtrent and the two-phase search.
> I'm curious about https://github.com/apache/lucene/pull/14009#issuecomment-2502665806 -- why is recall better for 1bit and 4bit than 7bit, when reranking? The graph is a bit confusing, but the dots are the oversample...