Optimization for time series data
Description
Hi, recently I read a paper from VLDB said it gains significant performance improvements against Lucene. It achieves 20x performance increase with standard queries, and 10x performance increase with histogram queries in massive log query scenarios.
After read the whole content, it seems that the core idea in this paper is similar to IndexSortSortedNumericDocValuesRangeQuery, dose someone have free time to read this paper and have a discussion here?
Hi,LuXugang. I have roughly read that paper. And I think it has a lot of interesting optimizations for lucene. I' m really interested about the reverse binary search algorithm for tail queries which was mentioned in paper,although I am not quite familiar with lucene's query implmentation😭. Could you tell me which lucene's files should I read, so I could implement that algorithm?
it seems that the core idea in this paper is similar to IndexSortSortedNumericDocValuesRangeQuery
This is my understanding as well, though it says it uses the BKD tree to figure out the range of doc IDs, not doc values, which seems to be the idea that is proposed at https://github.com/apache/lucene/pull/687 (which I just realized I had completely forgotten about :grimacing:).
Could you tell me which lucene's files should I read, so I could implement that algorithm?
Hi, @tang-hi . I think you could first read IndexSortSortedNumericDocValuesRangeQuery, then you would understand more about that paper. I would also be more than happy to learn from each other about Lucene with WeChat.
Could you tell me which lucene's files should I read, so I could implement that algorithm?
Hi, @tang-hi . I think you could first read
IndexSortSortedNumericDocValuesRangeQuery, then you would understand more about that paper. I would also be more than happy to learn from each other about Lucene with WeChat.
thanks!I will read it!