fast-elasticsearch-vector-scoring
fast-elasticsearch-vector-scoring copied to clipboard
Should we use refresh=true on production? (Test is not suitable for 7.9)
Hello, I would like to raise an issue about the test case.
According to fast-elasticsearch-vector-scoring/src/test/java/com/liorkn/elasticsearch/PluginTest.java file,
the test case uses params.put("refresh", "true") on data insertion request, which makes vector scoring result works properly.
public void test() throws Exception {
final Map<String, String> params = new HashMap<>();
params.put("refresh", "true");
However, in production case, we often do not use refresh=true option on data insertion.
If we do not use this option, the vector scoring result does not work properly. (such as there are same cosine similarity score among result documents...)
I think one of two options should be considered
- Modify the test code (removing
refresh=true) -> this may cause code level modification of vector scoring plugin - Mention in the document that
refresh=trueoption must be provided on data insertion
There is no need to use refresh=true in production. This plugin does not require that. Refresh=true just means the inserted doc is quariable immediately after insertion, which is a must for the test. https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html
Hello, thank you for your reply.
The issue can be specified like this.
- Case 1. insert single data without `refresh=true` => vector scoring fail (cosine similarity)
- Case 2. insert singe data with `refresh=true` => vector scoring success (cosine similarity)
- Case 3. bulk insert with `refresh=true` => vector scoring fail (cosine similarity)
vector scoring failmeans there are same similarity scores among result documents, which does not guarantee correct ordering of result vectors
In summary, I think the problem is refresh=true is required to get "proper vector scoring result"...
This issue may be connected to compatibility with elasitcsearch core since the above cases all worked as vector scoring success on elasticsearch 7.5.2 version.
right, thanks for clarifying. I was just able to recreate it in a test
I have no clue at the moment as of why this unexpected change in behaviour in this ES version.
for the meanwhile we need to use refresh=true in doc insertions which is not ideal