swift-distributed-actors
swift-distributed-actors copied to clipboard
[WIP] New multi-node infrastructure for integration tests
This replaces our previous "bunch of shell scripts" integration tests.
Resolves https://github.com/apple/swift-distributed-actors/issues/900
I actually found a bug while doing this so will solve https://github.com/apple/swift-distributed-actors/issues/1054 while doing this.
Ignore the ad-hoc JSON Coders here, those were to debug the issue.
This introduces a new way to write multi node tests which can span actual processes and automatically join a cluster. We can aggressively KILL those processes and assert on the outputs of such clusters.
We will also easily be able to deploy tests written using this infra to multiple actual physical nodes or docker containers -- similar to how Akka's multi-jvm tests were doing way back then. This will allow us to verify on real networks etc.
It also is amazing for reproducers -- we can exactly replicate behavior, without having to do the weird "make sure we resolve as remote" and other dances.
Screenshot just FYI how an output looks like -- speaking for myself, I can't get complicated things solved without such reliable test infra, so I'm more than happy it is back!
Running tests is done via swift package --disable-sandbox multi-node -c debug test
(or just swift package --disable-sandbox multi-node test
to run in -c release
mode). The plugin automatically compiles and runs tests in individual processes.
This is how an example test-case looks like:
import DistributedActors
import MultiNodeTestKit
public final class ClusterCrashMultiNodeTests: MultiNodeTestSuite {
public init() {}
/// Spawns two nodes: first and second, and forms a cluster with them.
///
/// ## Default execution
/// Unlike normal unit tests, each node is spawned in a separate process,
/// allowing is to kill nodes harshly by killing entire processes.
///
/// It also eliminates the possibility of "cheating" and a node peeking
/// at shared state, since the nodes are properly isolated as if in a real cluster.
///
/// ## Distributed execution
/// To execute the same test across different physical nodes pass a list of
/// nodes to use when running the test, e.g.
///
/// ```
/// swift package multi-node test --deploy 192.168.0.101:22,192.168.0.102:22,192.168.0.103:22 // TODO
/// ```
///
/// Which will evenly spread the test nodes across the passed physical worker nodes.
/// Actual network will be used, and it remains possible to kill off nodes and logs
/// from all nodes are gathered automatically upon test failures.
public enum Nodes: String, MultiNodeNodes {
case first
case second
}
public static func configureMultiNodeTest(settings: inout MultiNodeTestSettings) {
settings.initialJoinTimeout = .seconds(5)
settings.dumpNodeLogs = .always
settings.installPrettyLogger = false
}
public static func configureActorSystem(settings: inout ClusterSystemSettings) {
settings.logging.logLevel = .debug
}
public let testCrashSecondNode = MultiNodeTest(ClusterCrashMultiNodeTests.self) { multiNode in
// A checkPoint suspends until all nodes have reached it, and then all nodes resume execution.
try await multiNode.checkPoint("initial")
// We can execute code only on a specific node:
try await multiNode.on(.second) { second in
try second.shutdown()
return
}
try await multiNode.runOn(.first) { first in
try await first.cluster.waitFor(multiNode[.second], .down, within: .seconds(10))
}
}
}
Added CI pipeline to run integration tests.
@swift-server-bot test this please
Uncovering bugs in receptionist rewrite where ordering wasn't quite right anymore resulting in test hangs (and bad receptionist ordering bugs in the op-log).
Mostly stable locally but still stabilizing tests while here...