scylla-tools-java icon indicating copy to clipboard operation
scylla-tools-java copied to clipboard

cassandra-stress: counter_write workload got stuck with small population seq

Open amoskong opened this issue 7 years ago • 1 comments

from: https://github.com/scylladb/scylla/issues/2790#issuecomment-346643769 I will retest with laster scylla.

Installation details Scylla version (or git commit hash): 1.7.4-0.20170726.ff643e3 2.0.rc4-0.20170903.6e6de34 Cluster size: 4 OS (RHEL/CentOS/Ubuntu/AWS AMI): CentOS7

Description cassandra-stress got stuck after the total ops number reached to population seq. If we use a short duration (such as 10s), it won't exit.

Prepare Create keyspace2 (tables: counter1, standard1)

CREATE KEYSPACE IF NOT EXISTS keyspace2 WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} AND durable_writes = true;

CREATE TABLE IF NOT EXISTS keyspace2.counter1 ( key blob PRIMARY KEY, "C0" counter, "C1" counter, "C2" counter, "C3" counter, "C4" counter ) WITH COMPACT STORAGE AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL","rows_per_partition":"ALL"}' AND comment = '' AND compaction = {'class': 'SizeTieredCompactionStrategy'} AND compression = {} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; Result No output from commandline, and raise an exception after some minutes.

$ cassandra-stress counter_write no-warmup cl=QUORUM duration=5m -schema 'replication(factor=1) compaction(strategy=DateTieredCompactionStrategy)' keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000 -node 10.240.0.4 Connected to cluster: longevity-50gb-4d-amosread-db-cluster-76310bd4, max pending requests per connection 128, max connections per host 8 Datatacenter: datacenter1; Host: /10.240.0.30; Rack: rack1 Datatacenter: datacenter1; Host: /10.240.0.19; Rack: rack1 Datatacenter: datacenter1; Host: /10.240.0.4; Rack: rack1 Datatacenter: datacenter1; Host: /10.240.0.27; Rack: rack1 Created keyspaces. Sleeping 1s for propagation. Sleeping 2s... Running COUNTER_WRITE with 10 threads 5 minutes Failed to connect over JMX; not collecting these stats type, total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb total, 840, 840, 840, 840, 10.4, 3.6, 35.5, 87.1, 136.1, 136.1, 1.0, 0.00000, 0, 0, 0, 0, 0, 0 total, 1641, 767, 767, 767, 13.0, 9.7, 28.7, 209.9, 263.6, 263.6, 2.0, 0.04577, 0, 0, 0, 0, 0, 0 total, 3026, 1380, 1380, 1380, 7.2, 2.4, 22.0, 33.0, 44.0, 84.0, 3.0, 0.12653, 0, 0, 0, 0, 0, 0 total, 4213, 1133, 1133, 1133, 8.5, 9.3, 22.0, 32.6, 44.0, 65.1, 4.1, 0.09444, 0, 0, 0, 0, 0, 0 total, 5630, 1365, 1365, 1365, 7.5, 2.7, 22.0, 34.9, 76.0, 83.4, 5.1, 0.08485, 0, 0, 0, 0, 0, 0 total, 6626, 977, 977, 977, 10.2, 9.0, 26.5, 86.4, 122.9, 122.9, 6.2, 0.07556, 0, 0, 0, 0, 0, 0 total, 7659, 1020, 1020, 1020, 9.9, 10.2, 22.9, 45.9, 96.4, 102.6, 7.2, 0.06565, 0, 0, 0, 0, 0, 0 total, 9205, 1520, 1520, 1520, 6.6, 1.5, 21.9, 32.3, 46.1, 46.2, 8.2, 0.06942, 0, 0, 0, 0, 0, 0

java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck. Check GC/Heap size at org.apache.cassandra.stress.util.Timing.snap(Timing.java:98) at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156) at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:37) at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104) at java.lang.Thread.run(Thread.java:748)

amoskong avatar Nov 23 '17 22:11 amoskong

Reproduced with recent master.

scylla-jmx-666.development-20171121.f4ef4a5.el7.centos.noarch
scylla-conf-666.development-0.20171121.c1b97d1.el7.centos.x86_64
scylla-tools-core-666.development-20171121.c4ba9fc.el7.centos.noarch
scylla-server-666.development-0.20171121.c1b97d1.el7.centos.x86_64
scylla-tools-666.development-20171121.c4ba9fc.el7.centos.noarch
scylla-666.development-0.20171121.c1b97d1.el7.centos.x86_64
scylla-kernel-conf-666.development-0.20171121.c1b97d1.el7.centos.x86_64
$ cassandra-stress counter_write no-warmup cl=QUORUM duration=10s -schema 'replication(factor=1) compaction(strategy=DateTieredCompactionStrategy)' keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000
WARN  02:25:56 You listed localhost/0:0:0:0:0:0:0:1:9042 in your contact points, but it wasn't found in the control host's system.peers at startup
Connected to cluster: Test Cluster, max pending requests per connection 128, max connections per host 8
Datatacenter: datacenter1; Host: localhost/127.0.0.1; Rack: rack1
Created keyspaces. Sleeping 1s for propagation.
Sleeping 2s...
Running COUNTER_WRITE with 10 threads 10 seconds
Failed to connect over JMX; not collecting these stats
type,      total ops,    op/s,    pk/s,   row/s,    mean,     med,     .95,     .99,    .999,     max,   time,   stderr, errors,  gc: #,  max ms,  sum ms,  sdv ms,      mb
total,          1415,    1415,    1415,    1415,     6.8,     6.4,    13.0,    18.7,    29.8,    31.5,    1.0,  0.00000,      0,      0,       0,       0,       0,       0
total,          3542,    2010,    2010,    2010,     4.9,     4.5,     9.3,    12.1,    24.8,    25.9,    2.1,  0.11598,      0,      0,       0,       0,       0,       0
total,          6759,    3136,    3136,    3136,     3.1,     2.9,     6.6,     8.4,    10.8,    12.5,    3.1,  0.18488,      0,      0,       0,       0,       0,       0
<wait for some minutes....>
java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck. Check GC/Heap size
        at org.apache.cassandra.stress.util.Timing.snap(Timing.java:98)
        at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156)
        at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:37)
        at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104)
        at java.lang.Thread.run(Thread.java:748)
<stuck, not return....>

c-s processes status:

amos      6433  0.0  0.0 113128  1568 pts/4    S+   02:25   0:00 /bin/sh /usr/bin/cassandra-stress counter_write no-warmup cl=QUORUM duration=10s -schema replication(factor=1) compaction(strategy=DateTieredCompactionStrategy) keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000
amos      6443  182  6.9 4554116 535888 pts/4  Sl+  02:25   7:22 /bin/java -server -ea -cp /tmp/tmp.5YXcq3OEDz:/usr/share/scylla/cassandra/lib/airline-0.6.jar:/usr/share/scylla/cassandra/lib/antlr-runtime-3.5.2.jar:/usr/share/scylla/cassandra/lib/asm-5.0.4.jar:/usr/share/scylla/cassandra/lib/cassandra-driver-core-3.0.1-shaded.jar:/usr/share/scylla/cassandra/lib/commons-cli-1.1.jar:/usr/share/scylla/cassandra/lib/commons-codec-1.2.jar:/usr/share/scylla/cassandra/lib/commons-lang3-3.1.jar:/usr/share/scylla/cassandra/lib/commons-math3-3.2.jar:/usr/share/scylla/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/scylla/cassandra/lib/concurrentlinkedhashmap-lru-1.4.jar:/usr/share/scylla/cassandra/lib/disruptor-3.0.1.jar:/usr/share/scylla/cassandra/lib/ecj-4.4.2.jar:/usr/share/scylla/cassandra/lib/guava-18.0.jar:/usr/share/scylla/cassandra/lib/high-scale-lib-1.0.6.jar:/usr/share/scylla/cassandra/lib/jackson-core-asl-1.9.2.jar:/usr/share/scylla/cassandra/lib/jackson-mapper-asl-1.9.2.jar:/usr/share/scylla/cassandra/lib/jamm-0.3.0.jar:/usr/share/scylla/cassandra/lib/javax.inject.jar:/usr/share/scylla/cassandra/lib/jbcrypt-0.3m.jar:/usr/share/scylla/cassandra/lib/jcl-over-slf4j-1.7.7.jar:/usr/share/scylla/cassandra/lib/jna-4.0.0.jar:/usr/share/scylla/cassandra/lib/joda-time-2.4.jar:/usr/share/scylla/cassandra/lib/json-simple-1.1.jar:/usr/share/scylla/cassandra/lib/libthrift-0.9.2.jar:/usr/share/scylla/cassandra/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/scylla/cassandra/lib/logback-classic-1.1.3.jar:/usr/share/scylla/cassandra/lib/logback-core-1.1.3.jar:/usr/share/scylla/cassandra/lib/lz4-1.3.0.jar:/usr/share/scylla/cassandra/lib/metrics-core-3.1.0.jar:/usr/share/scylla/cassandra/lib/metrics-jvm-3.1.0.jar:/usr/share/scylla/cassandra/lib/metrics-logback-3.1.0.jar:/usr/share/scylla/cassandra/lib/netty-all-4.0.23.Final.jar:/usr/share/scylla/cassandra/lib/ohc-core-0.4.3.jar:/usr/share/scylla/cassandra/lib/ohc-core-j8-0.4.3.jar:/usr/share/scylla/cassandra/lib/reporter-config3-3.0.0.jar:/usr/share/scylla/cassandra/lib/reporter-config-base-3.0.0.jar:/usr/share/scylla/cassandra/lib/sigar-1.6.4.jar:/usr/share/scylla/cassandra/lib/slf4j-api-1.7.7.jar:/usr/share/scylla/cassandra/lib/snakeyaml-1.11.jar:/usr/share/scylla/cassandra/lib/snappy-java-1.1.1.7.jar:/usr/share/scylla/cassandra/lib/ST4-4.0.8.jar:/usr/share/scylla/cassandra/lib/stream-2.5.2.jar:/usr/share/scylla/cassandra/lib/thrift-server-0.3.7.jar:/usr/share/scylla/cassandra/apache-cassandra-3.0.8-SNAPSHOT.jar:/usr/share/scylla/cassandra/apache-cassandra.jar:/usr/share/scylla/cassandra/apache-cassandra-thrift-3.0.8-SNAPSHOT.jar:/usr/share/scylla/cassandra/scylla-tools-3.0.8-SNAPSHOT.jar:/usr/share/scylla/cassandra/stress.jar: -Dcassandra.storagedir= -Dlogback.configurationFile=logback-tools.xml org.apache.cassandra.stress.Stress counter_write no-warmup cl=QUORUM duration=10s -schema replication(factor=1) compaction(strategy=DateTieredCompactionStrategy) keyspace=keyspace2 -port jmx=6868 -mode cql3 native -rate threads=10 -pop seq=1..10000
$ strace -p 6433
strace: Process 6433 attached
wait4(-1, 

$ strace -p 6443
strace: Process 6443 attached
futex(0x7f8e8089f9d0, FUTEX_WAIT, 6444, NULL

amoskong avatar Nov 27 '17 02:11 amoskong