ltp icon indicating copy to clipboard operation
ltp copied to clipboard

Possible communication deadlocks (Discussion)

Open ryancaicse opened this issue 2 years ago • 3 comments

Hi, in the below code, it is possible for the thread to signal first (pthread_cond_broadcast at Line 108) and, later, the thread goes to the waiting site forever (at Line 50 in the function t1_func) without signal notified anymore.

We notice that there is a sleep (sleep(2); at Line 96) to ensure the main thread go the broadcast site "later". However, the child thread may still go the wait site later, in spite of the lower possibilitiy.

I also noticed that there are other places in the project using the pthread_cond_wait similar to this.

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_wait/3-1.c#L49-L54

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_wait/3-1.c#L96-L110

To avoid this problem, the recommended usage of thread_cond_signal and thread_cond_wait would be

pthread_mutex_t count_lock;
pthread_cond_t count_nonzero;
unsigned count;

decrement_count()
{
    pthread_mutex_lock(&count_lock);
    while (count == 0)
        pthread_cond_wait(&count_nonzero, &count_lock);
    count = count - 1;
    pthread_mutex_unlock(&count_lock);
}

increment_count()
{
    pthread_mutex_lock(&count_lock);
    if (count == 0)
        pthread_cond_signal(&count_nonzero);
    count = count + 1;
    pthread_mutex_unlock(&count_lock);
}

ryancaicse avatar Feb 27 '23 09:02 ryancaicse

Synchronization by sleep() is of course broken by design and needs to be fixed. However I do not get at all what are you recommending, looks like the piece of code was copied from a textbook and it's not obvious how it should be applied to the test if at all.

I think that obvious solution here is actually loop and retry the pthread_cond_broadcast() with a small sleep (something as 10 or 100ms) between retries until the child wakes up or until we are out of time.

metan-ucw avatar Feb 27 '23 09:02 metan-ucw

Hi, thank you for your reply. Let me explain the recommended code. The idea is to use an additional variable to ensure that, if the thread has sent the signal, the other thread should not enter the waiting loop, thereby avoiding the deadlock (waiting forever).

More specifically, if the thread (executing increment_count ) has reached the signal site, the count would be plus 1. As a result, the other thread (executing decrement_count) cannot enter the while loop and wait. I indeed copyed the recommended pattern from this.

I agree with you that there are many ways to avoid the deadlock situations.

ryancaicse avatar Feb 27 '23 10:02 ryancaicse

I roughly point out some other similar places, which also use sleep to synchronize. I hope this might help you.

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_wait/1-1.c#L50-L57

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_broadcast/2-1.c#L45-L50

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_wait/2-1.c#L50-L55

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_signal/2-1.c#L60-L66

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_broadcast/1-1.c#L44-L52

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/open_posix_testsuite/conformance/interfaces/pthread_cond_broadcast/4-1.c#L44-L50

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/kernel/fs/openfile/openfile.c#L247-L252

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/realtime/func/pi-tests/testpi-4.c#L97-L99

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/realtime/func/prio-wake/prio-wake.c#L152-L162

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/realtime/func/pi-tests/testpi-2.c#L97-L99

https://github.com/linux-test-project/ltp/blob/3e0e60074495e1c350b634848e5dd1f7993ec2ef/testcases/realtime/func/pi-tests/testpi-1.c#L95-L97

ryancaicse avatar Feb 27 '23 11:02 ryancaicse