lucene Optimization for time series data

Description

Hi, recently I read a paper from VLDB said it gains significant performance improvements against Lucene. It achieves 20x performance increase with standard queries, and 10x performance increase with histogram queries in massive log query scenarios.

After read the whole content, it seems that the core idea in this paper is similar to IndexSortSortedNumericDocValuesRangeQuery, dose someone have free time to read this paper and have a discussion here？

Sep 14 '22 02:09 LuXugang

Hi,LuXugang. I have roughly read that paper. And I think it has a lot of interesting optimizations for lucene. I' m really interested about the reverse binary search algorithm for tail queries which was mentioned in paper,although I am not quite familiar with lucene's query implmentation😭. Could you tell me which lucene's files should I read, so I could implement that algorithm?

Sep 15 '22 06:09 tang-hi

it seems that the core idea in this paper is similar to IndexSortSortedNumericDocValuesRangeQuery

This is my understanding as well, though it says it uses the BKD tree to figure out the range of doc IDs, not doc values, which seems to be the idea that is proposed at https://github.com/apache/lucene/pull/687 (which I just realized I had completely forgotten about :grimacing:).

Sep 15 '22 08:09 jpountz

Could you tell me which lucene's files should I read, so I could implement that algorithm?

Hi, @tang-hi . I think you could first read IndexSortSortedNumericDocValuesRangeQuery, then you would understand more about that paper. I would also be more than happy to learn from each other about Lucene with WeChat.

Sep 15 '22 15:09 LuXugang

Could you tell me which lucene's files should I read, so I could implement that algorithm?

Hi, @tang-hi . I think you could first read IndexSortSortedNumericDocValuesRangeQuery, then you would understand more about that paper. I would also be more than happy to learn from each other about Lucene with WeChat.

thanks!I will read it!

Sep 18 '22 03:09 tang-hi