Random freezes in async mode
Description
Hello,
I experience a problems when I use async functions. The effect is that async call is not executed immediately, but it only starts to be executed when execution of current call was finished. It looks like a freeze.
In my case I have one long async call which lasts a long time. In the meantime I call other async calls on different threads. When the problem occurs then they started to be executed only after the first call timeouted.
My email address in case of questions : [email protected].
Regards, Wojtek
Example/How to Reproduce
Use async functions. Very difficult to reproduce, time dependent, different behavior on different machines, sometimes not reproducible at all.
Possible Fix
The problem is that field cur_thread_num in ThreadPool becomes negative. I fixed it in a following way:
void ThreadPool::DelThread(std::thread::id id, bool change_idle_thread_num) {
const std::chrono::steady_clock::time_point now = std::chrono::steady_clock::now();
thread_mutex.lock();
--cur_thread_num;
if (change_idle_thread_num) <--- this condition is crucial
--idle_thread_num;
bool ThreadPool::CreateThread() {
...
DelThread(std::this_thread::get_id(), ! initialRun);
When it is initialRun then it should not be decremented, because it was not incremented. Logic similar to few lines below:
if (!initialRun) {
--idle_thread_num;
}
Where did you get it from?
GitHub (branch e.g. master)
Additional Context/Your Environment
- OS: Windows
- Version: 1.11.1, but it also behaved the same way on 1.11.2 (latest one).
@lewar-w thanks for reporting! Yeah, the current thread pool implementation is not so good. I started a replacement implementation and the only problem I have there is the MacOS tests fail most likely because the socket implementation in MacOS leads to problems.
Could you please try if this MR fixes your problems: https://github.com/libcpr/cpr/pull/1168
Yes. I will try to do it during this week.
I tested that branch and it looks that it does not happen there. Usually it happened once on 10 tries. I tested it 25 times without a problem.
Yes, that's expected broken behaviour of the current implementation.