nuttx icon indicating copy to clipboard operation
nuttx copied to clipboard

libc/semaphore:sem_init change defult protocol

Open anjiahao1 opened this issue 3 years ago • 18 comments

Semaphores are generally used in two ways, one is to count resources or notify event, the other is to use as a mutual exclusion lock In Nuttx, the default behavior of semaphores is mutual exclusion lock. I think it is better to change the default behavior to count resources or notify event, and it will be more general in Linux programs.

Why would I do this: In porting some programs which confirm POSIX standard, the semaphore is used to notify event, but its default behavior is a mutual exclusion lock(nuttx): One thread post the semaphore to indicate something happen, another thread gets the semaphore and exits. when the first thread post the semaphore again which will cause a crash when CONFIG_PRIORITY_INHERITANCE is enable, because htcb field in semholder_s point to a freed memory.

so, it's better to change the default sem behavior to no priority inheritance.

anjiahao1 avatar Dec 24 '21 02:12 anjiahao1

This is a massive breaking change. Please do not merge this until there is time after the holidays to get proper feed back from the reviews listed.

Yes, this is a partial change to show the issue and potential modification. If the community agree the fix after review, all [nx]sem_init need review and modify.

xiaoxiang781216 avatar Dec 24 '21 15:12 xiaoxiang781216

This will also break many things for downstream projects using NuttX.

We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.

To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:

  • If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.
  • If this is Linux behavior and is in contradiction to POSIX, then we should not do it.

Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.

hartmannathan avatar Dec 24 '21 16:12 hartmannathan

This will also break many things for downstream projects using NuttX.

Only when the project enable CONFIG_PRIORITY_INHERITANCE, otherwise no difference before and after apply this patch.

We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.

Sure.

To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:

  • If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.

As far as I know, POSIX never mention the priority inheritance in spec, so it's the special behavior implemented by NuttX to improve the real time capability. But, sem_init without nxsem_set_protocol(SEM_PRIO_NONE) activate the priority inheritance by default which make the follow simple POSIX compliant program crash after the second sem_post:

#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>

static sem_t g_sem;

static void *thread1_cb(void *arg)
{
  sem_wait(&g_sem);
  return NULL;
}

static void *thread2_cb(void *arg)
{
  sleep(2);
  sem_wait(&g_sem);
  return NULL;
}

static void *thread3_cb(void *arg)
{
  sleep(1);
  sem_post(&g_sem);
  sleep(2);
  sem_post(&g_sem);
  return NULL;
}

int main(int argc, char *argv[])
{
  sem_init(&g_sem);

  thread1 = pthread_create(thread1_cb...);
  thread2 = pthread_create(thread2_cb...);
  thread3 = pthread_create(thread3_cb...);

  pthread_join(&thread1);
  pthread_join(&thread2);
  pthread_join(&thread3);

  sem_destroy(&g_sem);
  return 0;
}

So, it's very dangerous to enable priority inheritance without explicit indicator(e.g. a function call or an initial flag).

  • If this is Linux behavior and is in contradiction to POSIX, then we should not do it.

Of course.

Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.

Sure.

As mentioned elsewhere, there are likely other parts of our tree that need review/changes.

Regarding whether we should do this or not, I cannot offer an opinion until I re-read POSIX details regarding semaphores to see if this is correct per our Inviolable Principles.

Ok, let's wait your research.

xiaoxiang781216 avatar Dec 24 '21 17:12 xiaoxiang781216

Drivers that use the semaphores for signaling should not be involved with priority inheritance (when enabled). That use case was assumed to be the non default.

It seams to me this change can be done with adding an Kconfig option for the default and leave it as it was. But allow the default to be changed if someone is using Linux code semantics.

davids5 avatar Dec 24 '21 19:12 davids5

Drivers that use the semaphores for signaling should not be involved with priority inheritance (when enabled). That use case was assumed to be the non default.

Why non default? semaphore use as signaling is a normal practice in POSIX. On the other hand, most program use POSIX pthread mutex as lock, so semaphore seldom use as lock except NuttX kernel. I have to say that it's very bad choice to enable priority inheritance by default when priority inheritance get implemented, because:

  1. It's an optional feature, which mean that the code which use sem_t but not call nxsem_set_protocol(NuttX specific), the behavior will change with/without enable this feature.
  2. Since priority inheritance is the default setting, which mean the wrong usage will crash the system. It's more serious than the priority inversion.

It seams to me this change can be done with adding an Kconfig option for the default and leave it as it was.

Kconfig is good to disable/enable the optional feature, but it isn't a good solution here. Do you want sem_t change it's behavior by an option? And all place which call sem_init/nxsem_init need check the option and call nxsem_set_protocol with the different value.

But allow the default to be changed if someone is using Linux code semantics.

First, it isn't Linux semantics. Second, my demo show that isn't a minor issue because the program will crash the system.

xiaoxiang781216 avatar Dec 25 '21 06:12 xiaoxiang781216

Seems that disabling priority inheritance by default is the right way of doing things. My arguments are that by default priority inheritance is used to struggle with resource guarding issue (mutual exclusion use case) and does not have much sense in case of event signaling use case. The POSIX semaphores however have have more wide range then mutual exclusion use case and it is more a coincidence that phtread_mutexs are implemented on top of POSIX semaphores in NuttX (I just want to emphasize that phtread_mutexs could have their own implementation as an alternative and are not tight together with semaphores in general). pthread_mutexs will still have priority inheritance enabled by default and that is fine because I truly believe that when NuttX user enables priority inheritance that is the exactly use case that needs to be handled by this feature. Summarizing above: my vote is to change the default protocol for POSIX semaphores and proceed with this change.

pkarashchenko avatar Dec 29 '21 21:12 pkarashchenko

This will also break many things for downstream projects using NuttX.

Only when the project enable CONFIG_PRIORITY_INHERITANCE, otherwise no difference before and after apply this patch.

We need to make sure the wider community notices this proposed change so it can be reviewed properly. I will post on the mailing list, but there are other venues, so to everyone frequenting them, please help draw attention to this there, too.

Sure.

To be clear, I am in favor of standards-compliance, provided this is in line with NuttX's Inviolables:

  • If it is standard POSIX behavior to do as this change proposes, then we should do so, but only after making sure it is reviewed properly.

As far as I know, POSIX never mention the priority inheritance in spec, so it's the special behavior implemented by NuttX to improve the real time capability. But, sem_init without nxsem_set_protocol(SEM_PRIO_NONE) activate the priority inheritance by default which make the follow simple POSIX compliant program crash after the second sem_post:

#include <pthread.h>
#include <semaphore.h>
#include <unistd.h>

static sem_t g_sem;

static void *thread1_cb(void *arg)
{
  sem_wait(&g_sem);
  return NULL;
}

static void *thread2_cb(void *arg)
{
  sleep(2);
  sem_wait(&g_sem);
  return NULL;
}

static void *thread3_cb(void *arg)
{
  sleep(1);
  sem_post(&g_sem);
  sleep(2);
  sem_post(&g_sem);
  return NULL;
}

int main(int argc, char *argv[])
{
  sem_init(&g_sem);

  thread1 = pthread_create(thread1_cb...);
  thread2 = pthread_create(thread2_cb...);
  thread3 = pthread_create(thread3_cb...);

  pthread_join(&thread1);
  pthread_join(&thread2);
  pthread_join(&thread3);

  sem_destroy(&g_sem);
  return 0;
}

After successfully sem_wait, the task becomes the semphore's holder. Then the task exit, but not be removed from the semphore's holder list, so it will crash if visit this holder in the future. I will do a commit to fix this problem.

So, it's very dangerous to enable priority inheritance without explicit indicator(e.g. a function call or an initial flag).

  • If this is Linux behavior and is in contradiction to POSIX, then we should not do it.

Of course.

Also, if this change is implemented, we need to increment the major version number. It will have to be NuttX 11.x.

Sure.

As mentioned elsewhere, there are likely other parts of our tree that need review/changes. Regarding whether we should do this or not, I cannot offer an opinion until I re-read POSIX details regarding semaphores to see if this is correct per our Inviolable Principles.

Ok, let's wait your research.

zhaoxiu-zeng avatar Dec 30 '21 10:12 zhaoxiu-zeng

@zhaoxiu-zeng how do you plan to fix this? I was looking into the code as well and was looking for a solution, but didn't found a lightweight solution. The code only idea that I had was to create a global list of semaphores that have at least one holder and iterate that list on task exit in order to find a semaphore that contain task in the holder list or create a list of semaphores in task control block and add semaphore to that list when task is added as a semaphore holder. But both seems to me quite "heavy" in terms of both memory and performance.

pkarashchenko avatar Dec 30 '21 12:12 pkarashchenko

@anjiahao1 has a simple fix which change tcb_s to pid, so we can detect the abnormal case and remove the orphan node easily. But, we want to fix the root cause first, and then add the safe guard in case of the wrong usage.

xiaoxiang781216 avatar Dec 30 '21 12:12 xiaoxiang781216

@xiaoxiang781216 changing tcb_s to pid adds at least some reliability, but still seems to be not bulletproof. In such case I would suggest to store both tcb_s and pid, so we can use next mechanism to improve validity check: if (pholder->htcb == nxsched_get_tcb(pholder->pid)), so the probability that same memory block will be allocated for a task and same pid will be assigned is relatively low. In this case we can insert a sanity check and "dead" nodes clean-up into each semaphore operation API call.

pkarashchenko avatar Dec 30 '21 13:12 pkarashchenko

Here is a real case: https://github.com/apache/incubator-nuttx-apps/pull/960

xiaoxiang781216 avatar Jan 05 '22 10:01 xiaoxiang781216

Another real case: https://github.com/apache/incubator-nuttx/pull/5170, both case will generate hard fault.

xiaoxiang781216 avatar Jan 05 '22 11:01 xiaoxiang781216

@pkarashchenko I have push the modification, let's discuss in the PR.

https://github.com/apache/incubator-nuttx/pull/5171

zhaoxiu-zeng avatar Jan 05 '22 11:01 zhaoxiu-zeng

Another real case: #5170, both case will generate hard fault.

I run ostest with https://github.com/apache/incubator-nuttx/pull/5171, no such assertions were found.

I think priority inheritance should only be valid for mutex, so there is only one holder per semphore at most.

The current implementation still cannot avoid priority flipping completely. For example, there are 3 tasks t2 > t1 > t0, t0 held sem1, t1 held sem2 and wait sem1, then t2 wait sem2. In this case, only t1's priority is raised. If multiple holders are supported, this problem is very difficult to solve, which is the O(n^2) level.

If only one holder is supported, many things can be simplified.

zhaoxiu-zeng avatar Jan 05 '22 11:01 zhaoxiu-zeng

Another real case: #5170, both case will generate hard fault.

I run ostest with https://github.com/apache/incubator-nuttx/pull/5171, no such assertions were found.

I think priority inheritance should only be valid for mutex, so there is only one holder per semphore at most.

The current implementation still cannot avoid priority flipping completely. For example, there are 3 tasks t2 > t1 > t0, t0 held sem1, t1 held sem2 and wait sem1, then t2 wait sem2. In this case, only t1's priority is raised. If multiple holders are supported, this problem is very difficult to solve, which is the O(n^2) level.

If only one holder is supported, many things can be simplified.

The semaphores are the building blocks for the pthread mutexes and here is some description for pthread mutexes:

When a thread makes a call to pthread_mutex_lock(), the mutex was initialized with the protocol attribute having the value PTHREAD_PRIO_INHERIT, when the calling thread is blocked because the mutex is owned by another thread, that owner thread shall inherit the priority level of the calling thread as long as it continues to own the mutex. The implementation shall update its execution priority to the maximum of its assigned priority and all its inherited priorities. Furthermore, if this owner thread itself becomes blocked on another mutex with the protocol attribute having the value PTHREAD_PRIO_INHERIT, the same priority inheritance effect shall be propagated to this other owner thread, in a recursive manner.

pkarashchenko avatar Jan 05 '22 22:01 pkarashchenko

Another real case: #5170, both case will generate hard fault.

I run ostest with #5171, no such assertions were found. I think priority inheritance should only be valid for mutex, so there is only one holder per semphore at most. The current implementation still cannot avoid priority flipping completely. For example, there are 3 tasks t2 > t1 > t0, t0 held sem1, t1 held sem2 and wait sem1, then t2 wait sem2. In this case, only t1's priority is raised. If multiple holders are supported, this problem is very difficult to solve, which is the O(n^2) level. If only one holder is supported, many things can be simplified.

The semaphores are the building blocks for the pthread mutexes and here is some description for pthread mutexes:

I mean when the semphore is used as mutex. If not, priority inheritance is not necessary.

When a thread makes a call to pthread_mutex_lock(), the mutex was initialized with the protocol attribute having the value PTHREAD_PRIO_INHERIT, when the calling thread is blocked because the mutex is owned by another thread, that owner thread shall inherit the priority level of the calling thread as long as it continues to own the mutex. The implementation shall update its execution priority to the maximum of its assigned priority and all its inherited priorities. Furthermore, if this owner thread itself becomes blocked on another mutex with the protocol attribute having the value PTHREAD_PRIO_INHERIT, the same priority inheritance effect shall be propagated to this other owner thread, in a recursive manner.

zhaoxiu-zeng avatar Jan 06 '22 01:01 zhaoxiu-zeng

I started similar pthread mutex related activity in https://github.com/apache/incubator-nuttx/pull/5180

pkarashchenko avatar Jan 06 '22 10:01 pkarashchenko

Related to https://github.com/apache/incubator-nuttx/issues/6084

pkarashchenko avatar Apr 16 '22 22:04 pkarashchenko

@anjiahao1 let rebase this patch and fix the conflict.

xiaoxiang781216 avatar Oct 18 '22 09:10 xiaoxiang781216

@pkarashchenko @hartmannathan @masayuki2009 this is the last step to correct the default value of priority inheritance, please review this patch and https://github.com/apache/incubator-nuttx/pull/7344.

xiaoxiang781216 avatar Oct 18 '22 14:10 xiaoxiang781216

How is this PR related to PR #7344 ? Should they be combined to a single PR?

hartmannathan avatar Oct 19 '22 03:10 hartmannathan

How is this PR related to PR #7344 ? Should they be combined to a single PR?

ok,i will use single PR and add documentation later

anjiahao1 avatar Oct 19 '22 03:10 anjiahao1

this is the last patchset to fix the default value of priority inheritance, please review them, @hartmannathan @davids5 @masayuki2009 @pkarashchenko . All related changes are recorded in https://github.com/apache/incubator-nuttx/issues/5184.

xiaoxiang781216 avatar Oct 19 '22 13:10 xiaoxiang781216

Before, mutual exclusion semaphores had priority inheritance enabled by default; now it is opt-in.

So I wonder if the opposite of commit 48d8383 ("sem:remove sem default protocl") should be done also: Locate all mutual exclusion semaphores and call (nx)sem_set_protocol(..., SEM_PRIO_INHERIT)?

@hartmannathan it was done by converting all mutual exclusion semaphores to mutex lock in this PR https://github.com/apache/incubator-nuttx/pull/6320.

xiaoxiang781216 avatar Oct 20 '22 03:10 xiaoxiang781216

#6965 this patch repaced all sem use as lock with mutex

anjiahao1 avatar Oct 20 '22 03:10 anjiahao1

#6965 this patch repaced all sem use as lock with mutex

Great! This is a much cleaner design. Thank you!

hartmannathan avatar Oct 20 '22 03:10 hartmannathan

@xiaoxiang781216 I think we need to merge https://github.com/apache/incubator-nuttx-apps/pull/1358 as well. Otherwise, qemu-intel64:ostest would fail.

masayuki2009 avatar Oct 22 '22 07:10 masayuki2009

I have documented the mitigation entry for the next release's Release Notes here: https://cwiki.apache.org/confluence/display/NUTTX/NuttX+11.1.0#NuttX11.1.0-DefaultSemaphoreProtocolChanged

Please check and let me know if there are any mistakes!

hartmannathan avatar Oct 25 '22 04:10 hartmannathan

The description looks good, @hartmannathan thanks.

xiaoxiang781216 avatar Oct 25 '22 05:10 xiaoxiang781216