orleans
orleans copied to clipboard
TestCluster.Deploy: Address already in use. (In Orleans 3.5.0)
We have a random issue when testing using xUnit IClassFixture to avoid creating the cluster every time for each test as it does take some times. This issue will appear randomly in time and randomly with our tests (we have more than hundreds of tests). Sometimes it happens like running 1/60 times to appear, and sometimes like 1/10. It only seems to happen on Linux (not on Windows).
The error shows like this:
System.AggregateException : One or more errors occurred. (Address already in use) (The following constructor parameters did not have matching fixture data: CommonFixture fixture)
---- Microsoft.AspNetCore.Connections.AddressInUseException : Address already in use
-------- System.Net.Sockets.SocketException : Address already in use
---- The following constructor parameters did not have matching fixture data: CommonFixture fixture
Stack Trace:
----- Inner Stack Trace #1 (Microsoft.AspNetCore.Connections.AddressInUseException) -----
at Orleans.Networking.Shared.SocketConnectionListener.Bind()
at Orleans.Networking.Shared.SocketConnectionListenerFactory.BindAsync(EndPoint endpoint, CancellationToken cancellationToken)
at Orleans.Runtime.Messaging.ConnectionListener.BindAsync(CancellationToken cancellationToken)
at Orleans.Runtime.Messaging.GatewayConnectionListener.OnRuntimeInitializeStart(CancellationToken cancellationToken)
at Orleans.Runtime.SiloLifecycleSubject.MonitoredObserver.OnStart(CancellationToken ct)
at Orleans.LifecycleSubject.OnStart(CancellationToken ct)
at Orleans.Runtime.Scheduler.AsyncClosureWorkItem.Execute()
at Orleans.Runtime.Silo.StartAsync(CancellationToken cancellationToken)
at Orleans.Hosting.SiloHost.StartAsync(CancellationToken cancellationToken)
at Orleans.TestingHost.InProcessSiloHandle.<>c__DisplayClass7_0.<<CreateAsync>b__0>d.MoveNext()
--- End of stack trace from previous location ---
at Orleans.TestingHost.InProcessSiloHandle.CreateAsync(String siloName, IList`1 configurationSources)
at Orleans.TestingHost.TestCluster.StartSiloAsync(Int32 instanceNumber, TestClusterOptions clusterOptions, IReadOnlyList`1 configurationOverrides, Boolean startSiloOnNewPort)
at Orleans.TestingHost.TestCluster.InitializeAsync()
at Orleans.TestingHost.TestCluster.DeployAsync()
at Orleans.TestingHost.TestCluster.DeployAsync()
at Orleans.TestingHost.TestCluster.Deploy()
at Kraken.Tools.Tests.Utils.FixtureBase..ctor(String playerNicknamePrefix, Int32 playersCount, Int32 serverInitialPort, Int32 serverCount) in /var/home/core/kraken/backend/src/Kraken.Tools.Tests/Utils/FixtureBase.cs:line 50
at Kraken.Tools.Tests.Fixtures.CommonFixture..ctor() in /var/home/core/kraken/backend/src/Kraken.Tools.Tests/Fixtures/CommonFixture.cs:line 7
----- Inner Stack Trace -----
at System.Net.Sockets.Socket.UpdateStatusAfterSocketErrorAndThrowException(SocketError error, String callerName) in /_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.cs:line 5019
at System.Net.Sockets.Socket.DoBind(EndPoint endPointSnapshot, SocketAddress socketAddress) in /_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.cs:line 828
at System.Net.Sockets.Socket.Bind(EndPoint localEP) in /_/src/libraries/System.Net.Sockets/src/System/Net/Sockets/Socket.cs:line 806
at Orleans.Networking.Shared.SocketConnectionListener.Bind()
----- Inner Stack Trace #2 (Xunit.Sdk.TestClassException) -----
The test structure we made is very similar to the code in this issue: https://github.com/dotnet/orleans/pull/5715; I have tried to change with Xunit.IAsyncLifetime and DisposeAsync, but this still shows the same issue. I am wondering if it is because some of the process are not close when the Silo starts the next test, and in the meanwhile, the Ports will cause conflicts as it already exists in the last test. Therefore, in the beginning of the ClusterFixture constructor, I added Task.Delay(10000);, but doesn't work. I am not sure if this is the problem, can you give me some advice? Thank you.
Hi. This error ocurrs to us also. Never in dev, only in devops. One solution was to tear down cluster after every test, but then in takes forever to test some hundreds of tests...
We are using nUnit, and currently we avoid this problem by instantiating and configuring test cluster in test class constructor and never tearing it down, only cleaning data after each test. It works for us though because we are not strictly unit testing but behaviour testing domains when orleans is involved in tests...
Hi @cerkoid,
Thank you for your advice of using **NUnit**, maybe it is a way of solution. By the way, I already teardown after every test, and even try to use the KillSiloAsync after stoping the silos like this:
public async Task DisposeAsync()
{
await Cluster.StopAllSilosAsync();
foreach (SiloHandle silo in Cluster.Silos)
{
await Cluster.KillSiloAsync(silo);
}
await Cluster.DisposeAsync();
GC.SuppressFinalize(this);
}
But the Address already in use still shows...
Any idea about this issue? @ReubenBond, @benjaminpetit thank you!
Apologies for the delay, @pikaqu888
How are you configuring your TestCluster? By default, it will choose random ports and will try to avoid selecting ports which are already in use, but it's not infallible. There is also the issue that ports can remain unusable for a period of time after you close them, as a precautionary measure by the OS (the TCP CLOSE_WAIT state).
Hi @ReubenBond,
Thank you for your reply and sorry for the delay. The TestCluster was configured as the issue: #5715, but each of the tests is in different projects. After writing a begging log in the constructor and an ending log in the Dispose(), I found even after using the xunit.runner.json to setting the "parallelizeTestCollections": false, some different projects will begin parallel, but only happens in Linux not in Windows. That is why the "Address already in use" sometimes appears. I am not sure, but I think maybe it is not an Orleans problem. What do you think about it?
Hi @ReubenBond,
Any updates about this problem? I tried to use a script command to run each project test one by one, which will not work in parallel, but the "Address already in use" still appears. As you mention that the ports can remain unusable for a period of time after you close them, is there a method to avoid them, I mean if we can close the random port after a project test and can be open as a never used port? Thank you
We've moved this issue to the Backlog. This means that it is not going to be worked on for the coming release. We review items in the backlog at the end of each milestone/release and depending on the team's priority we may reconsider this issue for the following milestone.