cassandra-lucene-index icon indicating copy to clipboard operation
cassandra-lucene-index copied to clipboard

Very slow process of compaction after index setup

Open karpa13a opened this issue 6 years ago • 5 comments

Good day C* is 3.11; plugin according version. ubuntu 16.04, java 1.8 latest version one DC, 3 nodes, keyspace with rf=3 at EC2 with 2 CPU and 4Gb memory each.

cluster works well, data inserted by batches each 15 mins, no problems with compactions and performance, datasize around 15M rows but im facing with strange behavior after creating lucene index: ive created index

CREATE CUSTOM INDEX gsm_index ON gsm ()
USING 'com.stratio.cassandra.lucene.Index'
WITH OPTIONS = {
   'refresh_seconds': '1',
   'schema': '{
      fields: {
         sid: {type: "string"},
         timestamp: {type: "date", pattern: "yyyy/MM/dd"},
         place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}
      }
   }',
   'indexing_threads': '4'
};

index created and works well on next day i see LA more than 3 (on each node), with queue of 8 compactions. i was dropped index and all compactions where done in 15 mins. ive recreated index and got same result on next day. table simple as follows:

CREATE TABLE gsm (
   sid text,
   timestamp timestamp,
   latitude double,
   longitude double,
   /other columns defenitions/,
   PRIMARY KEY (sid, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)

do i need update EC2 instance with more power? or i hit a bug?

karpa13a avatar May 10 '18 10:05 karpa13a

What type of disks are you using? I alleviated similar compaction problems by switching to solid state drives.

FourSeventy avatar May 13 '18 23:05 FourSeventy

@FourSeventy unfortunately but it's not an IO bottleneck( CPU bound tasks(

karpa13a avatar May 16 '18 11:05 karpa13a

unfortunately updating node from t2.medium(2 cpu) to t2.xlarge(4 cpu) didnt help. it just eat 350% of CPU.

this makes lucene indexes totally unusable(

may be i can do some kind of debug?

btw it's ok, that MemtableFlushWriter spams log file in around 2 mins? when there is no reads/updates

INFO  [MemtableFlushWriter:372] 2018-05-18 07:24:56,673 Index.scala:127 - Flushing Lucene index  /gsm_index/
INFO  [MemtableFlushWriter:373] 2018-05-18 07:26:00,154 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:374] 2018-05-18 07:27:57,105 Index.scala:127 - Flushing Lucene index /gsm_index/
INFO  [MemtableFlushWriter:375] 2018-05-18 07:29:52,975 Index.scala:127 - Flushing Lucene index /gsm_index/

karpa13a avatar May 18 '18 08:05 karpa13a

okay i created index without "place: {type: "geo_point", latitude: "latitude", longitude: "longitude"}" part and now compactions didnt stuck.

what was wrong with geo_point? currently index saved once in 3 hours: INFO [MemtableFlushWriter:508] 2018-05-20 12:00:02,154 Index.scala:127 - Flushing Lucene index ... INFO [MemtableFlushWriter:515] 2018-05-20 15:00:02,968 Index.scala:127 - Flushing Lucene index ...

karpa13a avatar May 20 '18 16:05 karpa13a

So what’s the Cassandra version and what’s the plugin version did we use to avoid compatibility issues? Any suggestions

nirmalsinghkps avatar May 23 '18 21:05 nirmalsinghkps