CCF icon indicating copy to clipboard operation
CCF copied to clipboard

Quantify and expand coverage of TLC simulation mode

Open lemmy opened this issue 1 year ago • 11 comments

TLC's simulation mode is generally effective at detecting regressions while being less resource-intensive than full model checking, which helps reduce the strain on our CI resources. However, the actual coverage of simulation mode is unknown. Additionally, increasing coverage is highly desirable.

  • [x] Quantify current coverage
  • [x] Check liveness properties during simulation
    • https://github.com/microsoft/CCF/pull/6545
  • [ ] Expand coverage by starting simulation from a larger set of initial states
    • [ ] Manually defined set of initial states
    • [ ] Adjust probabilities of failure actions based on length of trace, ...
    • [ ] RandomSetOfSubsets inductive invariant candidate

lemmy avatar Oct 07 '24 22:10 lemmy

Quantify current coverage with five nodes

Main observation: The RcvProposeVoteRequest action rarely, if ever, occurs, likely because the upper limit set for the length of the generated traces (-depth) is too restrictive. Note that exhaustive model checking shows that it takes 28/47 states with 2/3 nodes and 2 configuration for one of th nodes to be Retired Committed (client requests disabled).

The following plots display statistics gathered from a long-running simulation (note that the overall overhead associated with extended statistics (-Dtlc2.tool.Simulator.extendedStatistics=true) and the accuracy of the probabilistic data structure have been analyzed independently).

Figure_3

Python code used to generate the plot
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_json("SIMccfraft.ndjson", lines=True)
json_data = data.iloc[-1].to_dict()
actions_df = pd.DataFrame(json_data["actions"].items(), columns=["Action", "Frequency"])
actions_df_sorted = actions_df.sort_values(by="Frequency", ascending=False)
plt.figure(figsize=(10, 6))
plt.bar(actions_df_sorted["Action"], actions_df_sorted["Frequency"])
plt.yscale('log')
plt.xticks(rotation=90)
plt.xlabel('Actions')
plt.ylabel('Frequency (Log scale)')
plt.title('Action Frequencies with Logarithmic Y-Axis')
plt.tight_layout()
plt.show()

Figure_1

Python code used to generate the plot
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_json("SIMccfraft.ndjson", lines=True)
plt.plot(data['duration'], data['distinct'], marker='x', linestyle='--', color='black', label='distinct')
plt.plot(data['duration'], data['generated'], marker='o', linestyle='-', color='blue', label='generated')
plt.title('Distinct states over Duration')
plt.xlabel('Duration')
plt.ylabel('Count')
plt.grid(True)
plt.legend()
plt.show()

Variables with finitely many distinct values:

  • membershipState Cardinality([Servers -> MembershipState])
  • leadershipState Cardinality([Servers -> LeeadershipState])
  • retirementCompleted Cardinality([Servers -> SUBSET Servers])
  • hasJoined Cardinality([Servers -> BOOLEAN])
  • isNewFollower Cardinality([Servers -> BOOLEAN])

Figure_2

Python code used to generate the plot
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.cm as cm
import numpy as np
data = pd.read_json("SIMccfraft.ndjson", lines=True)
plt.figure(figsize=(10, 6))
i = 1
colors = cm.rainbow(np.linspace(0, 1, len(data['distinctvalues'].iloc[-1].keys())))
for value, color in zip(data['distinctvalues'].iloc[-1].keys(),colors):
    plt.subplot(4, 4, i)
    plt.plot(data['duration'], data['distinctvalues'].apply(lambda x: x.get(value, None)), marker='o', linestyle='-', label=value, color=color)
    plt.xlabel('Duration')
    plt.ylabel('Count')
    plt.grid(True)
    plt.legend()
    i += 1
plt.show()
@lemmy ➜ /workspaces/CCF (main) $ git diff
diff --git a/tla/consensus/SIMccfraft.cfg b/tla/consensus/SIMccfraft.cfg
index 675b9e322..de6948a74 100644
--- a/tla/consensus/SIMccfraft.cfg
+++ b/tla/consensus/SIMccfraft.cfg
@@ -49,12 +49,15 @@ CONSTANTS
     Extend <- [abs]ABSExtend
     CopyMaxAndExtend <- [abs]ABSCopyMaxAndExtend
 
-CONSTRAINT
-    StopAfter
+\* CONSTRAINT
+\*     StopAfter
 
 CHECK_DEADLOCK
     FALSE
 
+_PERIODIC
+    PrintStats
+
 PROPERTIES 
     CommittedLogAppendOnlyProp
     MonotonicTermProp
diff --git a/tla/consensus/SIMccfraft.tla b/tla/consensus/SIMccfraft.tla
index 3d86eb251..5663f08d2 100644
--- a/tla/consensus/SIMccfraft.tla
+++ b/tla/consensus/SIMccfraft.tla
@@ -45,6 +45,9 @@ DebugInvUpToDepth ==
     \* -depth after generating the first trace.
     TLCGet("level") < TLCGet("config").depth
 
+PrintStats ==
+    Serialize(<<TLCGet("stats")>>, "SIMccfraft.ndjson", [format |-> "NDJSON", charset |-> "UTF-8", openOptions |-> <<"WRITE", "CREATE", "APPEND">>])
+
 ----
 \* Refinement

Slightly better coverage with -depth 1000:

Figure_4

lemmy avatar Oct 08 '24 00:10 lemmy

@lemmy a detail, but:

plt.plot(data['duration'], data['generated'], marker='o', linestyle='-', color='blue', label='distinct')

should be:

plt.plot(data['duration'], data['generated'], marker='o', linestyle='-', color='blue', label='generated')

achamayou avatar Oct 08 '24 08:10 achamayou

@lemmy increasing the depth does sound like a good step to improve coverage.

RequestVote seems overly represented, even if we think that we want to focus on fault-heavy scenarios. Similarly, it looks like (and this is very rough from eyeballing the graph) that request votes is about two orders of magnitude more frequent than BecomeLeader. That's a lot of failed elections!

My sense is that we should aim to diminish timeout frequency, by maybe an order of magnitude or so, to get to stage where we are generally progressing further. It's still be quite a lot more faults than a typical system.

achamayou avatar Oct 08 '24 08:10 achamayou

I've been running simulations of the specification for a few hours, and so far, neither LogMatchingProp nor LeaderProp have been violated. However, the overall coverage is limited (see below) because I had to remove RandomElement, as it doesn't work well ENABLED/fairness. The incompatibility of RandomElement and ENABLED suggests to revive reinforcement learning to a) eliminate RandomElement from the spec, and b) learn the function that increases coverage automatically.

@lemmy ➜ /workspaces/CCF (main) $ git diff
diff --git a/tla/consensus/SIMccfraft.cfg b/tla/consensus/SIMccfraft.cfg
index d09154097..db75ab5b0 100644
--- a/tla/consensus/SIMccfraft.cfg
+++ b/tla/consensus/SIMccfraft.cfg
@@ -48,8 +48,8 @@ CONSTANTS
     Extend <- [abs]ABSExtend
     CopyMaxAndExtend <- [abs]ABSCopyMaxAndExtend
 
-CONSTRAINT
-    StopAfter
+\* CONSTRAINT
+\*     StopAfter
 
 CHECK_DEADLOCK
     FALSE
@@ -64,6 +64,9 @@ PROPERTIES
     PendingBecomesFollowerProp
     NeverCommitEntryPrevTermsProp
     RefinementToAbsProp
+    
+    LogMatchingProp
+    LeaderProp
 
 \* ALIAS
 \*     \* DebugAlias
diff --git a/tla/consensus/SIMccfraft.tla b/tla/consensus/SIMccfraft.tla
index 0c504749b..a4a3cbdc4 100644
--- a/tla/consensus/SIMccfraft.tla
+++ b/tla/consensus/SIMccfraft.tla
@@ -17,20 +17,12 @@ SIMInitReconfigurationVars ==
     \/ CCF!InitReconfigurationVars
 
 SIMCheckQuorum(i) ==
-    /\ 1 = RandomElement(1..10)
     /\ CCF!CheckQuorum(i)
 
 SIMChangeConfigurationInt(i, newConfiguration) ==
-    /\ 1 = RandomElement(1..100)
     /\ CCF!ChangeConfigurationInt(i, newConfiguration)
 
 SIMTimeout(i) ==
-    /\ \/ 1 = RandomElement(1..100)
-       \* Always allow Timeout if no messages are in the network
-       \* and no node is a candidate or leader.  Otherwise, the system
-       \* will deadlock if 1 # RandomElement(...).
-       \/ /\ \A s \in Servers: leadershipState[s] \notin {Leader, Candidate}
-          /\ Network!Messages = {}
     /\ CCF!Timeout(i)
 
 \* The state constraint  StopAfter  stops TLC after the alloted
diff --git a/tla/consensus/ccfraft.tla b/tla/consensus/ccfraft.tla
index 4857ff4b9..16d6e5d9d 100644
--- a/tla/consensus/ccfraft.tla
+++ b/tla/consensus/ccfraft.tla
@@ -1298,6 +1298,8 @@ Spec ==
     /\ \A s \in Servers : WF_vars(AdvanceCommitIndex(s))
     /\ \A s \in Servers : WF_vars(BecomeLeader(s))
     /\ \A s \in Servers : WF_vars(Timeout(s))
+    /\ \A s \in Servers : WF_vars(AppendRetiredCommitted(s))
+    /\ \A s \in Servers : WF_vars(ChangeConfiguration(s))
 
 ------------------------------------------------------------------------------
 \* Correctness invariants

Figure_1 Figure_2 Figure_3

lemmy avatar Oct 08 '24 17:10 lemmy

@lemmy perhaps I missing something here, but this does not seem like an improvement? It looks like there are fewer distinct states in proportion than before, the frequency of RequestVote has increased even more, and it looks like no successful retirement happened (I notice the run is shorter, but I am not sure that explains it).

achamayou avatar Oct 09 '24 12:10 achamayou

@lemmy perhaps I missing something here, but this does not seem like an improvement? It looks like there are fewer distinct states in proportion than before, the frequency of RequestVote has increased even more, and it looks like no successful retirement happened (I notice the run is shorter, but I am not sure that explains it).

The plots above display the state-space coverage without a de-prioritization of the actions Timeout, ChangeConfiguration, and CheckQuorum (see diff). Meanwhile, the coverage of the property space has improved, as it included the two liveness properties.

With the changes introduced in https://github.com/microsoft/CCF/pull/6545/commits/47eb0d99493c6ff2102615d23640e2c51a438977, state-space coverage returned to its previous level (see below), while also checking the two liveness properties:

Image Image Image

The plots appear more irregular because the data structure used to store the statistics, in this case HyperLogLog, is probabilistic and the duration is less.

lemmy avatar Oct 09 '24 15:10 lemmy

Ah, got it, that's excellent!

achamayou avatar Oct 09 '24 15:10 achamayou

Related: https://github.com/tlaplus/tlaplus/issues/1065

lemmy avatar Nov 11 '24 21:11 lemmy

@lemmy may be a silly question, but I was not able to find the info in your comment or the documentation (e.g., https://lamport.azurewebsites.net/tla/current-tools.pdf): Which TLC flags must I use to generate the required .ndjson file? Thanks

melhindi avatar Feb 11 '25 16:02 melhindi

@melhindi To enable TLC to periodically and at the end of model checking write the TLCGet("stats") to the SIMccfraft.ndjson file, simply add a definition like PrintStats to your specification. Then, add _PERIODIC PrintStats and POSTCONDITION PrintStats to the .cfg file.

EXTENDS IOUtils, TLC

PrintStats ==
    Serialize(<<TLCGet("stats")>>, "SIMccfraft.ndjson", [format |-> "NDJSON", charset |-> "UTF-8", openOptions |-> <<"WRITE", "CREATE", "APPEND">>])

IOUtils is part of the CommunityModules. If you’re using TLC from the Toolbox or VSCode, you won’t need to manually download it. However, if you’re running TLC from the command line, ensure that you place CommunityModules-deps.jar alongside tla2tools.jar. Note that all of this requires a recent nightly build of TLC.

Related: https://github.com/tlaplus/tlaplus/issues/1023

lemmy avatar Feb 11 '25 23:02 lemmy

Hi @lemmy, thanks, that worked. A small "caveat" was that I initially forgot to use the -simulate flag and pass the -Dtlc2.tool.Simulator.extendedStatistics=true option (hence, the output was initially incomplete) To summarize, here are the steps that were needed to get the json output:

  1. Add the following to the .tla file:
EXTENDS IOUtils, TLC

PrintStats ==
    Serialize(<<TLCGet("stats")>>, "SIMccfraft.ndjson", [format |-> "NDJSON", charset |-> "UTF-8", openOptions |-> <<"WRITE", "CREATE", "APPEND">>])
  1. Add the following to the .cfg file:
_PERIODIC PrintStats
POSTCONDITION PrintStats
  1. Make sure to pass the -Dtlc2.tool.Simulator.extendedStatistics=true option and use -simulate, i.e.:
java -Dtlc2.tool.Simulator.extendedStatistics=true -XX:+UseParallelGC -cp "tla2tools.jar:CommunityModules.jar" tlc2.TLC PATH-TO-SPEC -config PATH-TO-CFG-FILE -workers auto -simulate

melhindi avatar Feb 12 '25 09:02 melhindi