Result file was not created for successful/failed run of gemini
gemini tool : v.1.6.9 gemini cmd:
/$HOME/gemini -d --duration 3h --warmup 30m -c 100 -m mixed -f --non-interactive --cql-features normal --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" --test-cluster=10.0.67.95 --outfile /home/centos/gemini_result_a8d56584-9884-4c15-9747-59856e8d0791.log --seed 70 --oracle-cluster=10.0.129.183 '
Gemini finished exit status 0, but result log file was not created:
during run several actions for test cluster node were applyed: NoCorruptRepair TerminateAndReplaceNode CorruptThenRepair TruncateMonkey AbortRepairMonkey
from output:
{"L":"INFO","T":"2020-04-22T09:07:43.538Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2020-04-22T09:37:43.661Z","M":"Warmup done"}
2020/04/22 10:17:17 error: failed to connect to 10.0.80.120:9042 due to error: gocql: no response received from cassandra within timeout period
2020/04/22 10:17:54 error: failed to connect to 10.0.80.120:9042 due to error: gocql: no response received from cassandra within timeout period
{"L":"INFO","T":"2020-04-22T12:37:43.542Z","N":"pump","M":"Test run completed. Exiting."}
{"L":"INFO","T":"2020-04-22T12:37:43.542Z","N":"generator","M":"stopping partition key generation loop","keys_created":13493536991,"keys_emitted":4808295}
{"L":"INFO","T":"2020-04-22T12:37:51.941Z","M":"All jobs complete"}
(END)
Gemini detect error, but result file was not created and no any describe messages in log: gemini command:
gemini -d --duration 3h --warmup 30m -c 100 -m mixed -f --non-interactive --cql-features normal --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" --test-cluster=10.0.149.48 --outfile /home/centos/gemini_result_718be378-02bc-427e-b1bb-151e8f4310ef.log --seed 36 --oracle-cluster=10.0.138.207
gemini result in console log:
Seed: 36
Maximum duration: 3h0m0s
Warmup duration: 30m0s
Concurrency: 100
Test cluster: [10.0.149.48]
Oracle cluster: [10.0.138.207]
Output file: /home/centos/gemini_result_718be378-02bc-427e-b1bb-151e8f4310ef.log
Schema: {
"keyspace": {
"name": "ks1",
"replication": {
"class": "SimpleStrategy",
"replication_factor": "3"
},
"oracle_replication": {
"class": "SimpleStrategy",
"replication_factor": "1"
}
},
"tables": [
{
"name": "table1",
"partition_keys": [
{
"name": "pk0",
"type": "tinyint"
},
{
"name": "pk1",
"type": "tinyint"
},
{
"name": "pk2",
"type": "tinyint"
}
],
"clustering_keys": [
{
"name": "ck0",
"type": "blob"
},
{
"name": "ck1",
"type": "decimal"
}
],
"columns": [
{
"name": "col0",
"type": "blob"
},
{
"name": "col1",
"type": "uuid"
},
{
"name": "col2",
"type": "inet"
},
{
"name": "col3",
"type": {
"types": [
"date",
"decimal",
"decimal",
"timestamp",
"double",
"float",
"float",
"timestamp"
],
"frozen": false
}
},
{
"name": "col4",
"type": {
"key_type": "bigint",
"value_type": "double",
"frozen": false
}
},
{
"name": "col5",
"type": {
"key_type": "bigint",
"value_type": "tinyint",
"frozen": false
}
},
{
"name": "col6",
"type": {
"types": [
"smallint",
"timestamp",
"decimal",
"inet",
"int",
"inet",
"varint",
"decimal",
"uuid"
],
"frozen": true
}
},
{
"name": "col7",
"type": "float"
},
{
"name": "col8",
"type": "varchar"
},
{
"name": "col9",
"type": "tinyint"
},
{
"name": "col10",
"type": {
"key_type": "date",
"value_type": "decimal",
"frozen": true
}
},
{
"name": "col11",
"type": {
"types": {
"udt_3606525730_0": "double",
"udt_3606525730_1": "varint",
"udt_3606525730_2": "duration",
"udt_3606525730_3": "timeuuid",
"udt_3606525730_4": "text",
"udt_3606525730_5": "tinyint",
"udt_3606525730_6": "timeuuid",
"udt_3606525730_7": "tinyint",
"udt_3606525730_8": "float"
},
"type_name": "udt_3606525730",
"frozen": true
}
},
{
"name": "col12",
"type": "ascii"
}
],
"indexes": [
{
"name": "col7_idx",
"column": "col7",
"column_idx": 7
},
{
"name": "col9_idx",
"column": "col9",
"column_idx": 9
}
],
"materialized_views": [
{
"name": "table1_mv_0",
"partition_keys": [
{
"name": "col1",
"type": "uuid"
},
{
"name": "pk0",
"type": "tinyint"
},
{
"name": "pk1",
"type": "tinyint"
},
{
"name": "pk2",
"type": "tinyint"
}
],
"clustering_keys": [
{
"name": "ck0",
"type": "blob"
},
{
"name": "ck1",
"type": "decimal"
}
]
}
],
"known_issues": {
"https://github.com/scylladb/scylla/issues/3708": true
},
"table_options": null
}
]
}
{"L":"INFO","T":"2020-05-05T20:48:35.366Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2020-05-05T21:18:35.629Z","M":"Warmup done"}
{"L":"WARN","T":"2020-05-05T22:16:08.263Z","N":"sample_results","M":"Errors detected. Exiting."}
{"L":"INFO","T":"2020-05-06T00:18:35.366Z","N":"pump","M":"Test run completed. Exiting."}
{"L":"INFO","T":"2020-05-06T00:18:35.366Z","N":"generator","M":"stopping partition key generation loop","keys_created":11285996639,"keys_emitted":4351495}
{"L":"INFO","T":"2020-05-06T00:18:43.688Z","M":"All jobs complete"}
Error: gemini encountered errors, exiting with non zero status
gemini encountered errors, exiting with non zero status
When error detected, on one of test cluster node was running ENOSPC nemesis( simulate the end of space on node). But 2 other nodes are running without errors and not any errors in logs on them.
This should be handled better now after application of PR #239 perhaps we can close this and repoen it after trying in the upcoming release @aleksbykov ?
@dahankzter yes we can close it
@dahankzter , unfortunately the issue is still reproduced.Not always
Seed: 54
Maximum duration: 3h0m0s
Warmup duration: 30m0s
Concurrency: 100
Test cluster: [10.0.252.39]
Oracle cluster: [10.0.130.36]
Output file: /home/centos/gemini_result_dcaeb75a-f9d5-4dc7-9008-7f15586bdd49.log
Schema: {
"keyspace": {
"name": "ks1",
"replication": {
"class": "SimpleStrategy",
"replication_factor": "3"
},
"oracle_replication": {
"class": "SimpleStrategy",
"replication_factor": "1"
}
},
"tables": [
{
"name": "table1",
"partition_keys": [
{
"name": "pk0",
"type": "smallint"
},
{
"name": "pk1",
"type": "smallint"
},
{
"name": "pk2",
"type": "tinyint"
}
],
"clustering_keys": [
{
"name": "ck0",
"type": "smallint"
},
{
"name": "ck1",
"type": "float"
}
],
"columns": [
{
"name": "col0",
"type": {
"types": {
"udt_3625021091_0": "tinyint",
"udt_3625021091_1": "timeuuid",
"udt_3625021091_2": "inet"
},
"type_name": "udt_3625021091",
"frozen": true
}
},
{
"name": "col1",
"type": {
"types": [
"timeuuid",
"date",
"uuid",
"text",
"inet",
"varchar",
"date",
"text",
"inet",
"inet",
"varchar"
],
"frozen": false
}
},
{
"name": "col2",
"type": "varchar"
},
{
"name": "col3",
"type": "varint"
},
{
"name": "col4",
"type": {
"key_type": "varint",
"value_type": "text",
"frozen": true
}
},
{
"name": "col5",
"type": "varint"
},
{
"name": "col6",
"type": "bigint"
},
{
"name": "col7",
"type": "date"
},
{
"name": "col8",
"type": {
"types": {
"udt_3663736436_0": "uuid",
"udt_3663736436_1": "blob",
"udt_3663736436_2": "tinyint",
"udt_3663736436_3": "boolean",
"udt_3663736436_4": "timeuuid",
"udt_3663736436_5": "float",
"udt_3663736436_6": "timeuuid",
"udt_3663736436_7": "duration",
"udt_3663736436_8": "text"
},
"type_name": "udt_3663736436",
"frozen": true
}
},
{
"name": "col9",
"type": "float"
},
{
"name": "col10",
"type": "varchar"
},
{
"name": "col11",
"type": "float"
},
{
"name": "col12",
"type": "float"
}
],
"indexes": [
{
"name": "table1_col3_idx",
"column": "col3",
"column_idx": 3
},
{
"name": "table1_col5_idx",
"column": "col5",
"column_idx": 5
},
{
"name": "table1_col9_idx",
"column": "col9",
"column_idx": 9
},
{
"name": "table1_col11_idx",
"column": "col11",
"column_idx": 11
},
{
"name": "table1_col12_idx",
"column": "col12",
"column_idx": 12
}
],
"materialized_views": [
{
"name": "table1_mv_0",
"partition_keys": [
{
"name": "col10",
"type": "varchar"
},
{
"name": "pk0",
"type": "smallint"
},
{
"name": "pk1",
"type": "smallint"
},
{
"name": "pk2",
"type": "tinyint"
}
],
"clustering_keys": [
{
"name": "ck0",
"type": "smallint"
},
{
"name": "ck1",
"type": "float"
}
]
}
],
"known_issues": {
"https://github.com/scylladb/scylla/issues/3708": true
}
}
]
}
{"L":"INFO","T":"2020-07-02T10:02:00.806Z","N":"generator","M":"starting partition key generation loop"}
{"L":"WARN","T":"2020-07-02T10:32:07.757Z","N":"sample_results","M":"Errors detected. Exiting."}
{"L":"INFO","T":"2020-07-02T10:32:07.757Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2020-07-02T10:32:09.269Z","M":"result channel closed"}
gemini encountered errors, exiting with non zero status
Error: gemini encountered errors, exiting with non zero status
@slivne / @penberg do you have someone that can handle this? Some jobs are failing and we can't tell why and where was the error. It could hide a real scylla bug.
@roydahan, @aleksbykov , is this still the case ? I see we are doing .Sync on the end, so it should work properly, unless there is underlying golang issue that makes it loose data on exit.
@dkropachev , yes, again got such results. Looks like gemini stucked on exiting: Job: https://jenkins.scylladb.com/job/enterprise-2023.1/job/gemini-/job/gemini-3h-with-nemesis-test/10/
gemini output:
}
{"L":"INFO","T":"2023-05-18T12:52:52.172Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-05-18T16:22:52.146Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-05-18T16:22:52.172Z","M":"Test run completed. Exiting."}
(E}
{"L":"INFO","T":"2023-05-18T12:52:52.172Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-05-18T16:22:52.146Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-05-18T16:22:52.172Z","M":"Test run completed. Exiting."}
and there is no result file.
@aleksbykov , Absense of result channel closed gives me a clue that workgroup did not quit on waiting:
if err := g.Wait(); err != nil {
logger.Debug("error detected", zap.Error(err))
}
That part was redone using different mechanism at https://github.com/scylladb/gemini/commit/ef29ea56aab952d72ef0162db4ffc14f864c52f5
So, i could have gone on v.1.8.0, could you please upgrade gemini on 2023.1 branch and let's see if it happens there.
@aleksbykov , did it happen recently ?