raft icon indicating copy to clipboard operation
raft copied to clipboard

Random test failure: "Somehow we have more logs then commands"

Open pzel opened this issue 6 years ago • 1 comments
trafficstars

What are we doing to trigger the bug:

while (mix test test/fuzzy/partitions_test.exs:36); do sleep 1; done

What we expect:

When running the fuzzy/partitions_test repeatedly, we expect it to pass every time.

What happens:

Some runs trigger a test failure resembling the following (sometimes there is one Entry, sometimes two):

Logs we don't understand: [
  %Raft.Log.Entry{data: {:put, -34}, index: 471, term: 3, type: :command},
  %Raft.Log.Entry{data: {:put, -34}, index: 472, term: 3, type: :command}
]


  1) test node shutdowns (Raft.Fuzzy.PartitionsTest)
     test/fuzzy/partitions_test.exs:36
     ** (RuntimeError) Somehow we have more logs then commands wtf.
     code: assert Cluster.all_logs_match(cluster, commands)
     stacktrace:
       (raft) test/support/cluster.ex:132: Raft.Support.Cluster.compare_logs/3
       (raft) test/support/cluster.ex:123: Raft.Support.Cluster.missing_writes_on_server/2
       (elixir) lib/enum.ex:1336: Enum."-map/2-lists^map/1-0-"/2
       (elixir) lib/enum.ex:1336: Enum."-map/2-lists^map/1-0-"/2
       (raft) test/support/cluster.ex:117: Raft.Support.Cluster.missing_writes/2
       (raft) test/support/cluster.ex:107: Raft.Support.Cluster.verify_logs/2
       test/fuzzy/partitions_test.exs:61: (test)

     The following output was logged:

     16:30:46.462 [error] Task #PID<0.232.0> started from :s0 terminating
     ** (stop) exited in: :gen_statem.call({:s3, :nonode@nohost}, %Raft.RPC.AppendEntriesReq{entries: [], from: {:s0, :nonode@nohost}, leader_commit: 1, leader_id: {:s0, :nonode@nohost}, prev_log_index: 1, prev_log_term: 1, term: 1, to: {:s3, :nonode@nohost}}, :infinity)
         ** (EXIT) no process: the process is not alive or there's no process currently associated with the given name, possibly because its application isn't started
         (stdlib) gen.erl:228: :gen.do_for_proc/2
         (stdlib) gen_statem.erl:619: :gen_statem.call_dirty/4
         (raft) lib/raft/rpc.ex:80: Raft.RPC.do_send/1
         (elixir) lib/task/supervised.ex:90: Task.Supervised.invoke_mfa/2
         (stdlib) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
     Function: #Function<2.92344542/0 in Raft.RPC.send_msg/1>
         Args: []

Additional info:

We have added the following debug output to try to understand what precisely goes wrong during the failed test runs.

diff --git a/test/support/cluster.ex b/test/support/cluster.ex
index 25968bd..de65f99 100644
--- a/test/support/cluster.ex
+++ b/test/support/cluster.ex
@@ -87,6 +87,8 @@ defmodule Raft.Support.Cluster do
       {s, data}
     end

+    IO.inspect data, label: :data, limit: :infinity
+    IO.inspect commands, label: :commands, limit: :infinity
     verify_terms(data) && verify_logs(data, commands)
   end

We have also simplified the StreamData generator to make the command logs easier to read:

diff --git a/test/fuzzy/partitions_test.exs b/test/fuzzy/partitions_test.exs
index 6bee922..11e7bd4 100644
--- a/test/fuzzy/partitions_test.exs
+++ b/test/fuzzy/partitions_test.exs
@@ -10,7 +10,7 @@ defmodule Raft.Fuzzy.PartitionsTest do
   }

   def commands, do: one_of([
-    tuple({constant(:put), term()}),
+    tuple({constant(:put), integer()}),
     constant(:pop),
   ])

@@ -53,7 +53,8 @@ defmodule Raft.Fuzzy.PartitionsTest do
       Cluster.restart(cluster, shutdown)
     end

-    {commands, _errors} = Applier.stop_applying(applier)
+    {commands, errors} = Applier.stop_applying(applier)
+    IO.inspect(errors, label: :errors)
     IO.inspect(commands, label: "Commands after applying")
     Cluster.stop(cluster)

Attached is the full output of a failed test, run with the above changes in place.

more-logs-than-commands-no-errors.log

Linking @studzien -- we worked on this together.

pzel avatar Oct 17 '19 08:10 pzel

Thanks for the report! I'll see if I can figure out exactly what's going on. I've finally cleared off enough other projects to spend some time on this again.

keathley avatar Oct 24 '19 14:10 keathley