gemini icon indicating copy to clipboard operation
gemini copied to clipboard

Gemini generates mutations non-deterministically

Open nuivall opened this issue 1 year ago • 4 comments

I altered the code to print performed modifications:

diff --git a/pkg/jobs/jobs.go b/pkg/jobs/jobs.go
index c065b5c..ceb6496 100644
--- a/pkg/jobs/jobs.go
+++ b/pkg/jobs/jobs.go
@@ -334,6 +334,7 @@ func ddl(
                return nil
        }
        for _, ddlStmt := range ddlStmts.List {
+               fmt.Println(ddlStmt.PrettyCQL())
                if w := logger.Check(zap.DebugLevel, "ddl statement"); w != nil {
                        w.Write(zap.String("pretty_cql", ddlStmt.PrettyCQL()))
                }
@@ -379,6 +380,7 @@ func mutation(
                return err
        }
        if mutateStmt == nil {
+               fmt.Println("no statement generated")
                if w := logger.Check(zap.DebugLevel, "no statement generated"); w != nil {
                        w.Write(zap.String("job", "mutation"))
                }
@@ -391,6 +393,7 @@ func mutation(
                        g.GiveOld(mutateStmt.ValuesWithToken)
                }()
        }
+       fmt.Println(mutateStmt.PrettyCQL())
        if w := logger.Check(zap.DebugLevel, "mutation statement"); w != nil {
                w.Write(zap.String("pretty_cql", mutateStmt.PrettyCQL()))
        }

Then I executed two times with default seed 1:

/gemini -d --duration 1s --warmup 0 -c 2 -m mixed -f --cql-features basic --max-mutation-retries 5 --max-mutation-retries-backoff 500ms --test-cluster=192.168.100.3 --oracle-cluster=192.168.100.2  --non-interactive=true --level=warn > out.txt

I observed that around 500 insert statements were generated but diff shows that both runs don't seem to have much in common. Sorting the files to test whether it's just non-deterministic ordering also doesn't produce the same or similar outputs.

This undermines reproducibility of the issues. It would be much better if seed applied also to mutations and not only to initial schema. I don't know if this was omitted in the original design or a bug was introduced later.

nuivall avatar Jun 22 '23 13:06 nuivall

@nuivall , reason to that is that we are not using pseudo-randomization when generate schema and you did not provide schema to the gemini

dkropachev avatar Jul 01 '23 12:07 dkropachev

I have implemented #376 to isolate schema seed from statements seed and found that it does not fix all the problems.

dkropachev avatar Jul 02 '23 03:07 dkropachev

Actually when I was testing schema was the only stable thing (dependant on seed)

nuivall avatar Jul 04 '23 08:07 nuivall

It wasn't closed by #376

dkropachev avatar Aug 10 '23 16:08 dkropachev