rabbitmq-dotnet-client
rabbitmq-dotnet-client copied to clipboard
Synchronous operations like CreateModel block their thread waiting for a response to an async Write that they could be blocking.
Describe the bug
CreateModel writes to the async write channel in the SocketFrameHandler, then blocks waiting for a response. If you queue up enough of these operations it uses up all the available threads in the thread pool. The thread pool will start provisioning more threads in response, but once you're over your min thread count it does so relatively slowly, and you can find yourself timing out before it's gotten around to processing the write queue. I encountered this issue upgrading from 5.2 to 6.5. In my case I am using a library called Akka.NET to manage background jobs. I create an actor (aka thread with message pump) per job, and based on the documentation I figured I needed a channel per job. Over time jobs accumulated in my test environment and now when my app starts up I am attempting to create over 100 channels simultaneously. Note: This applies to all synchronous operations, not just CreateModel. I can reproduce this with QueueDeclare for example. In retrospect I would not have implemented my app this way, for several reasons, and in the medium term I plan to centralize my rabbit interactions more and not have so many threads interacting with the library directly. However, I do think it's worth considering fixing this issue because a) I can see how someone could end up here b) I feel like the library is not being a good ThreadPool citizen by locking down threads and relying on theadpool headroom to get unlocked c) It's kind of a bear to figure out what's going on when you encounter the problem.
Reproduction steps
https://github.com/michac/RabbitThreadPoolTest
// See https://aka.ms/new-console-template for more information
using RabbitMQ.Client;
Console.WriteLine("Hello, World!");
var factory = new ConnectionFactory()
{
Password = "password",
UserName = "tester",
Port = 5672,
HostName = "localhost"
};
var connection = factory.CreateConnection();
ThreadPool.SetMinThreads(2, 2);
var tasks = Enumerable.Range(1, 100)
.Select(i => Task.Run(() =>
{
WriteThreadPoolStatus(i);
Console.WriteLine($"Creating Model #{i}...");
var model = connection.CreateModel();
Console.WriteLine($"Model #{i} Ready");
return model;
})).ToArray();
// If you mark all the CreateModel Tasks as LongRunning, they don't
// block ThreadPool threads and it fixes congestion.
// var tasks = Enumerable.Range(1, 100)
// .Select(i => Task.Factory.StartNew(() =>
// {
// WriteThreadPoolStatus(i);
// Console.WriteLine($"Creating Model #{i}...");
// var model = connection.CreateModel();
// Console.WriteLine($"Model #{i} Ready");
// return model;
// }, TaskCreationOptions.LongRunning)).ToArray();
Task.WaitAll(tasks.ToArray<Task>());
WriteThreadPoolStatus(0);
void WriteThreadPoolStatus(int id)
{
Console.WriteLine($"Id[{id}]: ThreadCount={ThreadPool.ThreadCount}");
Console.WriteLine($"Id[{id}]: WorkItems: Pending={ThreadPool.PendingWorkItemCount}, Completed={ThreadPool.CompletedWorkItemCount}");
ThreadPool.GetMinThreads(out var minWorkerCount, out var _);
ThreadPool.GetMaxThreads(out var maxWorkerCount, out var _);
Console.WriteLine($"Id[{id}]: Configuration: Min={minWorkerCount}, Max={maxWorkerCount}");
ThreadPool.GetAvailableThreads(out var availableWorkerCount, out var _);
var grinchThreads = maxWorkerCount - minWorkerCount;
var santaThreads = Math.Max(0, availableWorkerCount - grinchThreads);
Console.WriteLine($"Id[{id}]: AvailableThreads: FastProvision={santaThreads}, SlowProvision={availableWorkerCount - santaThreads}");
}
Expected behavior
Operations like CreateModel are able to complete even if the ThreadPool is too busy to provide a second thread.
I have a few ideas for fixing the problem:
- I am remediating it for now (pending any refactoring on my part) by wrapping every synchronous operation in a LongRunning Task. This takes it off the ThreadPool so it won't clog things up. I think this creates a lot of unnecessary threads though.
- I updated the library to put the WriteLoop in SocketFrameHandler on a LongRunning thread, and made it completely synchronous. This solved the problem by making the Writes able to go through even if the ThreadPool was saturated.
- If the problem isn't worth fixing it would be nice to make it easier to identify: For example if the WriteLoop could detect if it was pulling old Writes off of its channel and log an error or signal a problem.
Additional context
Problems like this should go away once the library is async across the board (which sounds like the direction you are headed).