cassandra-lucene-index
cassandra-lucene-index copied to clipboard
Indexing is taking too long for a 2 GB data ? anything can be done??
For a 2 GB data with 3 columns trying to index , its keep on running at back ground been more than 6 hours Still I dont see entry at system."IndexInfo" , quite confused on whats happening at back ground and is this plugin a right candidate for heavy tables with huge data.
1. How to know the progress of index creation ? 2. How frequent this index will be updated , after its FIRST indexing ? 3. Is this plugin an ideal candidate to index when a table has more than 250 Gb of data
-
You can watch the progress by modifying trace statements to INFO. Recompile the plugin. https://github.com/Stratio/cassandra-lucene-index/blob/branch-3.0.14/plugin/src/main/scala/com/stratio/cassandra/lucene/IndexWriter.scala
-
This is depending on your settings. But default refresh is triggered every 60s scanning for updates.
-
This is a partitioning. Keep your partition size no larger than 10G (C* 3.11) with strong CPU/NVME storage. 250G is a lot of data if you're doing 128k columns that's still 2M rows; so only index what you need, and use filter to narrow data set.