rucene icon indicating copy to clipboard operation
rucene copied to clipboard

master分支对应lucene哪个版本?

Open wenma opened this issue 4 years ago • 3 comments

用master分支build了一个索引,读取segments内容后发现版本是6.4.18? 这个版本对应兼容lucene哪个版本呢?

>> read(segments_1)

header length: 35
lucene version: 6.4.18
version: 4
nameCounter: 1
segCount: 1
...
  1. rucene生成的索引是完全兼容原生lucene的吗?
  2. 有没有和原生lucene做对比的benchmark数据?
  3. 有没有在分布式存储上build索引的测试数据(之前看过你们分享的ppt)?
  4. merge segment的重IO操作rucene的表现怎么样? 尤其是在分布式存储上,有没有数据?

wenma avatar Dec 25 '19 07:12 wenma

Thanks for your interest. In order to make this conversation beneficial for non Chinese speakers, I'm going to answer your questions in English.

Q: Does Rucene produce compatible index with Lucene? A: Yes it is fully compatible at this time. There was a long time before write was implemented, we used Rucene to serve online search with indices built by Lucene.

Q: Do you have benchmark against official Lucene A: There is no thorough benchmark yet. But I would say performance wise they are similar as long as there are memory for JVM. The biggest advantage of Rucene is deterministic response time. There is no GC caused pause time and cluster outage due to frequent full GC.

Q: Do you have benchmark to build index on distributed file system. A: Simple answer is no, we run it on local provisioned volume with K8S. However we have plan to deploy our search engine on Ceph in future

Q: Does segment merging pose significant burden to system? A: We run online index update with it, yes it is not trivial to run indexing and segment merging. I would say it is pretty much the same as Lucene.

sunxiaoguang avatar Dec 25 '19 16:12 sunxiaoguang

But I would say performance wise they are similar as long as there are memory for JVM.

For intersections, Rucene is a bit slower but very close to Lucene's performance. For unions on the other hand, Lucene is much faster. Lucene is typically more than 10x faster if the block-wand optimisation can be used and more than 2x faster otherwise.

Source: https://github.com/tantivy-search/search-benchmark-game https://tantivy-search.github.io/bench/

fulmicoton avatar Dec 27 '19 07:12 fulmicoton

But I would say performance wise they are similar as long as there are memory for JVM.

For intersections, Rucene is a bit slower but very close to Lucene's performance. For unions on the other hand, Lucene is much faster. Lucene is typically more than 10x faster if the block-wand optimisation can be used and more than 2x faster otherwise.

Source: https://github.com/tantivy-search/search-benchmark-game https://tantivy-search.github.io/bench/

Thank's a lot for your great work Paul. @tongjianlin can we take a look at the benchmark and investigate the new optimizations introduced into Lucene that we are missing in the past.

sunxiaoguang avatar Dec 28 '19 00:12 sunxiaoguang