scylla-tools-java
scylla-tools-java copied to clipboard
RuntimeException: Timed out waiting for a timer thread - seems one got stuck. Check GC/Heap size
$ rpm -qa |grep scylla
scylla-tools-2.3.1-20181021.823346d3b0.el7.noarch
scylla-conf-2.3.1-0.20181021.336c77166.el7.x86_64
scylla-tools-core-2.3.1-20181021.823346d3b0.el7.noarch
Steps:
- create a cluster with 4 nodes
- fill data by:
cassandra-stress user no-warmup profile=/tmp/sst3_schema.yaml ops'(insert=1)' cl=QUORUM n=10000000 -rate threads=1000 -pop seq=1..10000000
- verify data by read1:
cassandra-stress user no-warmup profile=/tmp/sst3_schema.yaml ops'(read1=1)' cl=ALL n=10000000 -rate threads=1000 -pop seq=1..10000000
- verify data with multiple workload ops:
cassandra-stress user no-warmup profile=/tmp/sst3_schema.yaml ops'(read1=1,read2=1,update_static=1,update_ttl=1,update_diff1_ts=1,update_diff2_ts=1,update_same1_ts=1,update_same2_ts=1)' cl=ALL n=10000000 -rate threads=200 -pop seq=1..10000000
Problem can be reproduced:
- ops'(read1=1,read2=1,update_static=1,update_ttl=1,update_diff1_ts=1,update_diff2_ts=1,update_same1_ts=1,update_same2_ts=1,alter_table=0)'
- ops'(read1=1,read2=1,update_static=1,update_ttl=1,update_diff1_ts=1,update_diff2_ts=1,update_same1_ts=1,update_same2_ts=1,alter_table=1)'
If I remove alter_table=* from ops parameter, the problem can't be reproduced.
- ops'(read1=1,read2=1,update_static=1,update_ttl=1,update_diff1_ts=1,update_diff2_ts=1,update_same1_ts=1,update_same2_ts=1)'
Attached the complete schema file: sst3_schema.yaml.txt
$ cassandra-stress user no-warmup profile=/tmp/sst3_schema.yaml ops'(read1=1,read2=1,update_static=1,update_ttl=1,update_diff1_ts=1,update_diff2_ts=1,update_same1_ts=1,update_same2_ts=1,alter_table=0)' cl=ALL n=10000000 -rate threads=200 -pop seq=1..10000000 -node 10.240.0.125
WARN 13:09:06 Only 49322 MB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Connected to cluster: rolling-upgrade-l-ak-30-db-cluster-aecb964b, max pending requests per connection 128, max connections per host 8
Datatacenter: datacenter1; Host: /10.240.0.143; Rack: rack1
Datatacenter: datacenter1; Host: /10.240.0.139; Rack: rack1
Datatacenter: datacenter1; Host: /10.240.0.140; Rack: rack1
Datatacenter: datacenter1; Host: /10.240.0.125; Rack: rack1
Created schema. Sleeping 1s for propagation.
Created extra schema. Sleeping 1s for propagation.
Sleeping 2s...
Running [read1, read2, update_static, update_ttl, update_diff1_ts, update_diff2_ts, update_same1_ts, update_same2_ts, alter_table] with 200 threads for 10000000 iteration
Failed to connect over JMX; not collecting these stats
type, total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb
java.lang.RuntimeException: Timed out waiting for a timer thread - seems one got stuck. Check GC/Heap size
at org.apache.cassandra.stress.util.Timing.snap(Timing.java:98)
at org.apache.cassandra.stress.StressMetrics.update(StressMetrics.java:156)
at org.apache.cassandra.stress.StressMetrics.access$300(StressMetrics.java:37)
at org.apache.cassandra.stress.StressMetrics$2.run(StressMetrics.java:104)
at java.lang.Thread.run(Thread.java:748)
FAILURE
insert:
partitions: fixed(1)
select: fixed(1)/1000
batchtype: UNLOGGED
queries:
read1:
cql: select * from user_with_ck where key = ? LIMIT 1
fields: samerow
read2:
cql: select * from user_with_ck where key = ? and md5 = ? LIMIT 1
fields: samerow
update_static:
cql: update user_with_ck USING TTL 5 set static_int = ? where key = ?
fields: samerow
update_ttl:
cql: update user_with_ck USING TTL 5 set check_date = ? where key = ? and md5 = ?
fields: samerow
update_diff1_ts:
cql: update user_with_ck USING TIMESTAMP 10 set check_date = ? where key = ? and md5 = ?
fields: samerow
update_diff2_ts:
cql: update user_with_ck USING TIMESTAMP 5 set pass_date = ? where key = ? and md5 = ?
fields: samerow
update_same1_ts:
cql: update user_with_ck USING TIMESTAMP 10 set check_date = '2018-01-01T11:21:59.001+0000' where key = ? and md5 = ?
fields: samerow
update_same2_ts:
cql: update user_with_ck USING TIMESTAMP 10 set pass_date = '2018-01-01T11:21:59.001+0000' where key = ? and md5 = ?
fields: samerow
alter_table:
cql: ALTER TABLE user_with_ck WITH comment = 'updated'
fields: samerow
delete_row:
cql: delete from user_with_ck where key = ? and md5 = ?
fields: samerow
The grafana snapshot (from c-s started to c-s failed):
I think this is a c-s issue - I assume it will fail also on cassandra, can we test that.
I don't think this is a regression. e.g. there was no release before this worked on.