nuttx
nuttx copied to clipboard
libc/semaphore:sem_init change defult protocol
Semaphores are generally used in two ways, one is to count resources or notify event, the other is to use as a mutual exclusion lock In Nuttx, the default behavior of semaphores is mutual exclusion lock. I think it is better to change the default behavior to count resources or notify event, and it will be more general in Linux programs.
Why would I do this: In porting some programs which confirm POSIX standard, the semaphore is used to notify event, but its default behavior is a mutual exclusion lock(nuttx): One thread post the semaphore to indicate something happen, another thread gets the semaphore and exits. when the first thread post the semaphore again which will cause a crash when CONFIG_PRIORITY_INHERITANCE is enable, because htcb field in semholder_s point to a freed memory.
so, it's better to change the default sem behavior to no priority inheritance.
This is a massive breaking change. Please do not merge this until there is time after the holidays to get proper feed back from the reviews listed.
Yes, this is a partial change to show the issue and potential modification. If the community agree the fix after review, all [nx]sem_init need review and modify.
This will also break many things for downstream projects using NuttX.
We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.
To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:
- If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.
- If this is Linux behavior and is in contradiction to POSIX, then we should not do it.
Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.
This will also break many things for downstream projects using NuttX.
Only when the project enable CONFIG_PRIORITY_INHERITANCE, otherwise no difference before and after apply this patch.
We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.
Sure.
To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:
- If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.
As far as I know, POSIX never mention the priority inheritance in spec, so it's the special behavior implemented by NuttX to improve the real time capability. But, sem_init without nxsem_set_protocol(SEM_PRIO_NONE) activate the priority inheritance by default which make the follow simple POSIX compliant program crash after the second sem_post:
#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>
static sem_t g_sem;
static void *thread1_cb(void *arg)
{
sem_wait(&g_sem);
return NULL;
}
static void *thread2_cb(void *arg)
{
sleep(2);
sem_wait(&g_sem);
return NULL;
}
static void *thread3_cb(void *arg)
{
sleep(1);
sem_post(&g_sem);
sleep(2);
sem_post(&g_sem);
return NULL;
}
int main(int argc, char *argv[])
{
sem_init(&g_sem);
thread1 = pthread_create(thread1_cb...);
thread2 = pthread_create(thread2_cb...);
thread3 = pthread_create(thread3_cb...);
pthread_join(&thread1);
pthread_join(&thread2);
pthread_join(&thread3);
sem_destroy(&g_sem);
return 0;
}
So, it's very dangerous to enable priority inheritance without explicit indicator(e.g. a function call or an initial flag).
- If this is Linux behavior and is in contradiction to POSIX, then we should not do it.
Of course.
Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.
Sure.
As mentioned elsewhere, there are likely other parts of our tree that need review/changes.
Regarding whether we should do this or not, I cannot offer an opinion until I re-read POSIX details regarding semaphores to see if this is correct per our Inviolable Principles.
Ok, let's wait your research.
Drivers that use the semaphores for signaling should not be involved with priority inheritance (when enabled). That use case was assumed to be the non default.
It seams to me this change can be done with adding an Kconfig option for the default and leave it as it was. But allow the default to be changed if someone is using Linux code semantics.
Drivers that use the semaphores for signaling should not be involved with priority inheritance (when enabled). That use case was assumed to be the non default.
Why non default? semaphore use as signaling is a normal practice in POSIX. On the other hand, most program use POSIX pthread mutex as lock, so semaphore seldom use as lock except NuttX kernel. I have to say that it's very bad choice to enable priority inheritance by default when priority inheritance get implemented, because:
- It's an optional feature, which mean that the code which use sem_t but not call nxsem_set_protocol(NuttX specific), the behavior will change with/without enable this feature.
- Since priority inheritance is the default setting, which mean the wrong usage will crash the system. It's more serious than the priority inversion.
It seams to me this change can be done with adding an Kconfig option for the default and leave it as it was.
Kconfig is good to disable/enable the optional feature, but it isn't a good solution here. Do you want sem_t change it's behavior by an option? And all place which call sem_init/nxsem_init need check the option and call nxsem_set_protocol with the different value.
But allow the default to be changed if someone is using Linux code semantics.
First, it isn't Linux semantics. Second, my demo show that isn't a minor issue because the program will crash the system.
Seems that disabling priority inheritance by default is the right way of doing things. My arguments are that by default priority inheritance is used to struggle with resource guarding issue (mutual exclusion use case) and does not have much sense in case of event signaling use case. The POSIX semaphores however have have more wide range then mutual exclusion use case and it is more a coincidence that phtread_mutex
s are implemented on top of POSIX semaphores in NuttX (I just want to emphasize that phtread_mutex
s could have their own implementation as an alternative and are not tight together with semaphores in general). pthread_mutex
s will still have priority inheritance enabled by default and that is fine because I truly believe that when NuttX user enables priority inheritance that is the exactly use case that needs to be handled by this feature.
Summarizing above: my vote is to change the default protocol for POSIX semaphores and proceed with this change.
This will also break many things for downstream projects using NuttX.
Only when the project enable CONFIG_PRIORITY_INHERITANCE, otherwise no difference before and after apply this patch.
We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.
Sure.
To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:
- If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.
As far as I know, POSIX never mention the priority inheritance in spec, so it's the special behavior implemented by NuttX to improve the real time capability. But, sem_init without nxsem_set_protocol(SEM_PRIO_NONE) activate the priority inheritance by default which make the follow simple POSIX compliant program crash after the second sem_post:
#include <pthread.h> #include <semaphore.h> #include <unistd.h> static sem_t g_sem; static void *thread1_cb(void *arg) { sem_wait(&g_sem); return NULL; } static void *thread2_cb(void *arg) { sleep(2); sem_wait(&g_sem); return NULL; } static void *thread3_cb(void *arg) { sleep(1); sem_post(&g_sem); sleep(2); sem_post(&g_sem); return NULL; } int main(int argc, char *argv[]) { sem_init(&g_sem); thread1 = pthread_create(thread1_cb...); thread2 = pthread_create(thread2_cb...); thread3 = pthread_create(thread3_cb...); pthread_join(&thread1); pthread_join(&thread2); pthread_join(&thread3); sem_destroy(&g_sem); return 0; }
After successfully sem_wait, the task becomes the semphore's holder. Then the task exit, but not be removed from the semphore's holder list, so it will crash if visit this holder in the future. I will do a commit to fix this problem.
So, it's very dangerous to enable priority inheritance without explicit indicator(e.g. a function call or an initial flag).
- If this is Linux behavior and is in contradiction to POSIX, then we should not do it.
Of course.
Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.
Sure.
As mentioned elsewhere, there are likely other parts of our tree that need review/changes. Regarding whether we should do this or not, I cannot offer an opinion until I re-read POSIX details regarding semaphores to see if this is correct per our Inviolable Principles.
Ok, let's wait your research.
@zhaoxiu-zeng how do you plan to fix this? I was looking into the code as well and was looking for a solution, but didn't found a lightweight solution. The code only idea that I had was to create a global list of semaphores that have at least one holder and iterate that list on task exit in order to find a semaphore that contain task in the holder list or create a list of semaphores in task control block and add semaphore to that list when task is added as a semaphore holder. But both seems to me quite "heavy" in terms of both memory and performance.
@anjiahao1 has a simple fix which change tcb_s to pid, so we can detect the abnormal case and remove the orphan node easily. But, we want to fix the root cause first, and then add the safe guard in case of the wrong usage.
@xiaoxiang781216 changing tcb_s
to pid
adds at least some reliability, but still seems to be not bulletproof. In such case I would suggest to store both tcb_s
and pid
, so we can use next mechanism to improve validity check: if (pholder->htcb == nxsched_get_tcb(pholder->pid))
, so the probability that same memory block will be allocated for a task and same pid
will be assigned is relatively low. In this case we can insert a sanity check and "dead" nodes clean-up into each semaphore operation API call.
Here is a real case: https://github.com/apache/incubator-nuttx-apps/pull/960
Another real case: https://github.com/apache/incubator-nuttx/pull/5170, both case will generate hard fault.
@pkarashchenko I have push the modification, let's discuss in the PR.
https://github.com/apache/incubator-nuttx/pull/5171
Another real case: #5170, both case will generate hard fault.
I run ostest with https://github.com/apache/incubator-nuttx/pull/5171, no such assertions were found.
I think priority inheritance should only be valid for mutex, so there is only one holder per semphore at most.
The current implementation still cannot avoid priority flipping completely. For example, there are 3 tasks t2 > t1 > t0, t0 held sem1, t1 held sem2 and wait sem1, then t2 wait sem2. In this case, only t1's priority is raised. If multiple holders are supported, this problem is very difficult to solve, which is the O(n^2) level.
If only one holder is supported, many things can be simplified.
Another real case: #5170, both case will generate hard fault.
I run ostest with https://github.com/apache/incubator-nuttx/pull/5171, no such assertions were found.
I think priority inheritance should only be valid for mutex, so there is only one holder per semphore at most.
The current implementation still cannot avoid priority flipping completely. For example, there are 3 tasks t2 > t1 > t0, t0 held sem1, t1 held sem2 and wait sem1, then t2 wait sem2. In this case, only t1's priority is raised. If multiple holders are supported, this problem is very difficult to solve, which is the O(n^2) level.
If only one holder is supported, many things can be simplified.
The semaphores are the building blocks for the pthread mutexes and here is some description for pthread mutexes:
When a thread makes a call to pthread_mutex_lock(), the mutex was initialized with the protocol attribute having the value PTHREAD_PRIO_INHERIT, when the calling thread is blocked because the mutex is owned by another thread, that owner thread shall inherit the priority level of the calling thread as long as it continues to own the mutex. The implementation shall update its execution priority to the maximum of its assigned priority and all its inherited priorities. Furthermore, if this owner thread itself becomes blocked on another mutex with the protocol attribute having the value PTHREAD_PRIO_INHERIT, the same priority inheritance effect shall be propagated to this other owner thread, in a recursive manner.
Another real case: #5170, both case will generate hard fault.
I run ostest with #5171, no such assertions were found. I think priority inheritance should only be valid for mutex, so there is only one holder per semphore at most. The current implementation still cannot avoid priority flipping completely. For example, there are 3 tasks t2 > t1 > t0, t0 held sem1, t1 held sem2 and wait sem1, then t2 wait sem2. In this case, only t1's priority is raised. If multiple holders are supported, this problem is very difficult to solve, which is the O(n^2) level. If only one holder is supported, many things can be simplified.
The semaphores are the building blocks for the pthread mutexes and here is some description for pthread mutexes:
I mean when the semphore is used as mutex. If not, priority inheritance is not necessary.
When a thread makes a call to pthread_mutex_lock(), the mutex was initialized with the protocol attribute having the value PTHREAD_PRIO_INHERIT, when the calling thread is blocked because the mutex is owned by another thread, that owner thread shall inherit the priority level of the calling thread as long as it continues to own the mutex. The implementation shall update its execution priority to the maximum of its assigned priority and all its inherited priorities. Furthermore, if this owner thread itself becomes blocked on another mutex with the protocol attribute having the value PTHREAD_PRIO_INHERIT, the same priority inheritance effect shall be propagated to this other owner thread, in a recursive manner.
I started similar pthread mutex
related activity in https://github.com/apache/incubator-nuttx/pull/5180
Related to https://github.com/apache/incubator-nuttx/issues/6084
@anjiahao1 let rebase this patch and fix the conflict.
@pkarashchenko @hartmannathan @masayuki2009 this is the last step to correct the default value of priority inheritance, please review this patch and https://github.com/apache/incubator-nuttx/pull/7344.
How is this PR related to PR #7344 ? Should they be combined to a single PR?
How is this PR related to PR #7344 ? Should they be combined to a single PR?
ok,i will use single PR and add documentation later
this is the last patchset to fix the default value of priority inheritance, please review them, @hartmannathan @davids5 @masayuki2009 @pkarashchenko . All related changes are recorded in https://github.com/apache/incubator-nuttx/issues/5184.
Before, mutual exclusion semaphores had priority inheritance enabled by default; now it is opt-in.
So I wonder if the opposite of commit 48d8383 ("sem:remove sem default protocl") should be done also: Locate all mutual exclusion semaphores and call (nx)sem_set_protocol(..., SEM_PRIO_INHERIT)?
@hartmannathan it was done by converting all mutual exclusion semaphores to mutex lock in this PR https://github.com/apache/incubator-nuttx/pull/6320.
#6965 this patch repaced all sem use as lock with mutex
Great! This is a much cleaner design. Thank you!
@xiaoxiang781216
I think we need to merge https://github.com/apache/incubator-nuttx-apps/pull/1358 as well.
Otherwise, qemu-intel64:ostest
would fail.
I have documented the mitigation entry for the next release's Release Notes here: https://cwiki.apache.org/confluence/display/NUTTX/NuttX+11.1.0#NuttX11.1.0-DefaultSemaphoreProtocolChanged
Please check and let me know if there are any mistakes!
The description looks good, @hartmannathan thanks.