gemini icon indicating copy to clipboard operation
gemini copied to clipboard

Result file was not created for successful/failed run of gemini

Open aleksbykov opened this issue 5 years ago • 9 comments

gemini tool : v.1.6.9 gemini cmd:

/$HOME/gemini -d --duration 3h --warmup 30m -c 100 -m mixed -f --non-interactive --cql-features normal --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" --test-cluster=10.0.67.95 --outfile /home/centos/gemini_result_a8d56584-9884-4c15-9747-59856e8d0791.log --seed 70 --oracle-cluster=10.0.129.183 '

Gemini finished exit status 0, but result log file was not created:

during run several actions for test cluster node were applyed: NoCorruptRepair TerminateAndReplaceNode CorruptThenRepair TruncateMonkey AbortRepairMonkey

from output:

{"L":"INFO","T":"2020-04-22T09:07:43.538Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2020-04-22T09:37:43.661Z","M":"Warmup done"}
2020/04/22 10:17:17 error: failed to connect to 10.0.80.120:9042 due to error: gocql: no response received from cassandra within timeout period
2020/04/22 10:17:54 error: failed to connect to 10.0.80.120:9042 due to error: gocql: no response received from cassandra within timeout period
{"L":"INFO","T":"2020-04-22T12:37:43.542Z","N":"pump","M":"Test run completed. Exiting."}
{"L":"INFO","T":"2020-04-22T12:37:43.542Z","N":"generator","M":"stopping partition key generation loop","keys_created":13493536991,"keys_emitted":4808295}
{"L":"INFO","T":"2020-04-22T12:37:51.941Z","M":"All jobs complete"}
(END)

aleksbykov avatar Apr 23 '20 10:04 aleksbykov

Gemini detect error, but result file was not created and no any describe messages in log: gemini command:

gemini -d --duration 3h --warmup 30m -c 100 -m mixed -f --non-interactive --cql-features normal --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --async-objects-stabilization-attempts 5 --async-objects-stabilization-backoff 500ms --replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '3'}" --oracle-replication-strategy "{'class': 'SimpleStrategy', 'replication_factor': '1'}" --test-cluster=10.0.149.48 --outfile /home/centos/gemini_result_718be378-02bc-427e-b1bb-151e8f4310ef.log --seed 36 --oracle-cluster=10.0.138.207

gemini result in console log:

Seed:			36
Maximum duration:	3h0m0s
Warmup duration:	30m0s
Concurrency:		100
Test cluster:		[10.0.149.48]
Oracle cluster:		[10.0.138.207]
Output file:		/home/centos/gemini_result_718be378-02bc-427e-b1bb-151e8f4310ef.log
Schema: {
    "keyspace": {
        "name": "ks1",
        "replication": {
            "class": "SimpleStrategy",
            "replication_factor": "3"
        },
        "oracle_replication": {
            "class": "SimpleStrategy",
            "replication_factor": "1"
        }
    },
    "tables": [
        {
            "name": "table1",
            "partition_keys": [
                {
                    "name": "pk0",
                    "type": "tinyint"
                },
                {
                    "name": "pk1",
                    "type": "tinyint"
                },
                {
                    "name": "pk2",
                    "type": "tinyint"
                }
            ],
            "clustering_keys": [
                {
                    "name": "ck0",
                    "type": "blob"
                },
                {
                    "name": "ck1",
                    "type": "decimal"
                }
            ],
            "columns": [
                {
                    "name": "col0",
                    "type": "blob"
                },
                {
                    "name": "col1",
                    "type": "uuid"
                },
                {
                    "name": "col2",
                    "type": "inet"
                },
                {
                    "name": "col3",
                    "type": {
                        "types": [
                            "date",
                            "decimal",
                            "decimal",
                            "timestamp",
                            "double",
                            "float",
                            "float",
                            "timestamp"
                        ],
                        "frozen": false
                    }
                },
                {
                    "name": "col4",
                    "type": {
                        "key_type": "bigint",
                        "value_type": "double",
                        "frozen": false
                    }
                },
                {
                    "name": "col5",
                    "type": {
                        "key_type": "bigint",
                        "value_type": "tinyint",
                        "frozen": false
                    }
                },
                {
                    "name": "col6",
                    "type": {
                        "types": [
                            "smallint",
                            "timestamp",
                            "decimal",
                            "inet",
                            "int",
                            "inet",
                            "varint",
                            "decimal",
                            "uuid"
                        ],
                        "frozen": true
                    }
                },
                {
                    "name": "col7",
                    "type": "float"
                },
                {
                    "name": "col8",
                    "type": "varchar"
                },
                {
                    "name": "col9",
                    "type": "tinyint"
                },
                {
                    "name": "col10",
                    "type": {
                        "key_type": "date",
                        "value_type": "decimal",
                        "frozen": true
                    }
                },
                {
                    "name": "col11",
                    "type": {
                        "types": {
                            "udt_3606525730_0": "double",
                            "udt_3606525730_1": "varint",
                            "udt_3606525730_2": "duration",
                            "udt_3606525730_3": "timeuuid",
                            "udt_3606525730_4": "text",
                            "udt_3606525730_5": "tinyint",
                            "udt_3606525730_6": "timeuuid",
                            "udt_3606525730_7": "tinyint",
                            "udt_3606525730_8": "float"
                        },
                        "type_name": "udt_3606525730",
                        "frozen": true
                    }
                },
                {
                    "name": "col12",
                    "type": "ascii"
                }
            ],
            "indexes": [
                {
                    "name": "col7_idx",
                    "column": "col7",
                    "column_idx": 7
                },
                {
                    "name": "col9_idx",
                    "column": "col9",
                    "column_idx": 9
                }
            ],
            "materialized_views": [
                {
                    "name": "table1_mv_0",
                    "partition_keys": [
                        {
                            "name": "col1",
                            "type": "uuid"
                        },
                        {
                            "name": "pk0",
                            "type": "tinyint"
                        },
                        {
                            "name": "pk1",
                            "type": "tinyint"
                        },
                        {
                            "name": "pk2",
                            "type": "tinyint"
                        }
                    ],
                    "clustering_keys": [
                        {
                            "name": "ck0",
                            "type": "blob"
                        },
                        {
                            "name": "ck1",
                            "type": "decimal"
                        }
                    ]
                }
            ],
            "known_issues": {
                "https://github.com/scylladb/scylla/issues/3708": true
            },
            "table_options": null
        }
    ]
}
{"L":"INFO","T":"2020-05-05T20:48:35.366Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2020-05-05T21:18:35.629Z","M":"Warmup done"}
{"L":"WARN","T":"2020-05-05T22:16:08.263Z","N":"sample_results","M":"Errors detected. Exiting."}
{"L":"INFO","T":"2020-05-06T00:18:35.366Z","N":"pump","M":"Test run completed. Exiting."}
{"L":"INFO","T":"2020-05-06T00:18:35.366Z","N":"generator","M":"stopping partition key generation loop","keys_created":11285996639,"keys_emitted":4351495}
{"L":"INFO","T":"2020-05-06T00:18:43.688Z","M":"All jobs complete"}
Error: gemini encountered errors, exiting with non zero status
gemini encountered errors, exiting with non zero status

When error detected, on one of test cluster node was running ENOSPC nemesis( simulate the end of space on node). But 2 other nodes are running without errors and not any errors in logs on them.

aleksbykov avatar May 06 '20 08:05 aleksbykov

This should be handled better now after application of PR #239 perhaps we can close this and repoen it after trying in the upcoming release @aleksbykov ?

dahankzter avatar May 18 '20 12:05 dahankzter

@dahankzter yes we can close it

aleksbykov avatar May 27 '20 07:05 aleksbykov

@dahankzter , unfortunately the issue is still reproduced.Not always

Seed:			54
Maximum duration:	3h0m0s
Warmup duration:	30m0s
Concurrency:		100
Test cluster:		[10.0.252.39]
Oracle cluster:		[10.0.130.36]
Output file:		/home/centos/gemini_result_dcaeb75a-f9d5-4dc7-9008-7f15586bdd49.log
Schema: {
    "keyspace": {
        "name": "ks1",
        "replication": {
            "class": "SimpleStrategy",
            "replication_factor": "3"
        },
        "oracle_replication": {
            "class": "SimpleStrategy",
            "replication_factor": "1"
        }
    },
    "tables": [
        {
            "name": "table1",
            "partition_keys": [
                {
                    "name": "pk0",
                    "type": "smallint"
                },
                {
                    "name": "pk1",
                    "type": "smallint"
                },
                {
                    "name": "pk2",
                    "type": "tinyint"
                }
            ],
            "clustering_keys": [
                {
                    "name": "ck0",
                    "type": "smallint"
                },
                {
                    "name": "ck1",
                    "type": "float"
                }
            ],
            "columns": [
                {
                    "name": "col0",
                    "type": {
                        "types": {
                            "udt_3625021091_0": "tinyint",
                            "udt_3625021091_1": "timeuuid",
                            "udt_3625021091_2": "inet"
                        },
                        "type_name": "udt_3625021091",
                        "frozen": true
                    }
                },
                {
                    "name": "col1",
                    "type": {
                        "types": [
                            "timeuuid",
                            "date",
                            "uuid",
                            "text",
                            "inet",
                            "varchar",
                            "date",
                            "text",
                            "inet",
                            "inet",
                            "varchar"
                        ],
                        "frozen": false
                    }
                },
                {
                    "name": "col2",
                    "type": "varchar"
                },
                {
                    "name": "col3",
                    "type": "varint"
                },
                {
                    "name": "col4",
                    "type": {
                        "key_type": "varint",
                        "value_type": "text",
                        "frozen": true
                    }
                },
                {
                    "name": "col5",
                    "type": "varint"
                },
                {
                    "name": "col6",
                    "type": "bigint"
                },
                {
                    "name": "col7",
                    "type": "date"
                },
                {
                    "name": "col8",
                    "type": {
                        "types": {
                            "udt_3663736436_0": "uuid",
                            "udt_3663736436_1": "blob",
                            "udt_3663736436_2": "tinyint",
                            "udt_3663736436_3": "boolean",
                            "udt_3663736436_4": "timeuuid",
                            "udt_3663736436_5": "float",
                            "udt_3663736436_6": "timeuuid",
                            "udt_3663736436_7": "duration",
                            "udt_3663736436_8": "text"
                        },
                        "type_name": "udt_3663736436",
                        "frozen": true
                    }
                },
                {
                    "name": "col9",
                    "type": "float"
                },
                {
                    "name": "col10",
                    "type": "varchar"
                },
                {
                    "name": "col11",
                    "type": "float"
                },
                {
                    "name": "col12",
                    "type": "float"
                }
            ],
            "indexes": [
                {
                    "name": "table1_col3_idx",
                    "column": "col3",
                    "column_idx": 3
                },
                {
                    "name": "table1_col5_idx",
                    "column": "col5",
                    "column_idx": 5
                },
                {
                    "name": "table1_col9_idx",
                    "column": "col9",
                    "column_idx": 9
                },
                {
                    "name": "table1_col11_idx",
                    "column": "col11",
                    "column_idx": 11
                },
                {
                    "name": "table1_col12_idx",
                    "column": "col12",
                    "column_idx": 12
                }
            ],
            "materialized_views": [
                {
                    "name": "table1_mv_0",
                    "partition_keys": [
                        {
                            "name": "col10",
                            "type": "varchar"
                        },
                        {
                            "name": "pk0",
                            "type": "smallint"
                        },
                        {
                            "name": "pk1",
                            "type": "smallint"
                        },
                        {
                            "name": "pk2",
                            "type": "tinyint"
                        }
                    ],
                    "clustering_keys": [
                        {
                            "name": "ck0",
                            "type": "smallint"
                        },
                        {
                            "name": "ck1",
                            "type": "float"
                        }
                    ]
                }
            ],
            "known_issues": {
                "https://github.com/scylladb/scylla/issues/3708": true
            }
        }
    ]
}
{"L":"INFO","T":"2020-07-02T10:02:00.806Z","N":"generator","M":"starting partition key generation loop"}
{"L":"WARN","T":"2020-07-02T10:32:07.757Z","N":"sample_results","M":"Errors detected. Exiting."}
{"L":"INFO","T":"2020-07-02T10:32:07.757Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2020-07-02T10:32:09.269Z","M":"result channel closed"}
gemini encountered errors, exiting with non zero status
Error: gemini encountered errors, exiting with non zero status

aleksbykov avatar Jul 02 '20 13:07 aleksbykov

@slivne / @penberg do you have someone that can handle this? Some jobs are failing and we can't tell why and where was the error. It could hide a real scylla bug.

roydahan avatar Oct 01 '20 11:10 roydahan

@roydahan, @aleksbykov , is this still the case ? I see we are doing .Sync on the end, so it should work properly, unless there is underlying golang issue that makes it loose data on exit.

dkropachev avatar May 03 '23 17:05 dkropachev

@dkropachev , yes, again got such results. Looks like gemini stucked on exiting: Job: https://jenkins.scylladb.com/job/enterprise-2023.1/job/gemini-/job/gemini-3h-with-nemesis-test/10/

gemini output:

}
{"L":"INFO","T":"2023-05-18T12:52:52.172Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-05-18T16:22:52.146Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-05-18T16:22:52.172Z","M":"Test run completed. Exiting."}
(E}
{"L":"INFO","T":"2023-05-18T12:52:52.172Z","N":"generator","M":"starting partition key generation loop"}
{"L":"INFO","T":"2023-05-18T16:22:52.146Z","N":"pump","M":"Test run stopped. Exiting."}
{"L":"INFO","T":"2023-05-18T16:22:52.172Z","M":"Test run completed. Exiting."}

and there is no result file.

aleksbykov avatar May 30 '23 09:05 aleksbykov

@aleksbykov , Absense of result channel closed gives me a clue that workgroup did not quit on waiting:

		if err := g.Wait(); err != nil {
			logger.Debug("error detected", zap.Error(err))
		}

That part was redone using different mechanism at https://github.com/scylladb/gemini/commit/ef29ea56aab952d72ef0162db4ffc14f864c52f5

So, i could have gone on v.1.8.0, could you please upgrade gemini on 2023.1 branch and let's see if it happens there.

dkropachev avatar Jun 01 '23 12:06 dkropachev

@aleksbykov , did it happen recently ?

dkropachev avatar Jul 08 '23 02:07 dkropachev