Remove errno accesses from the OS
Propsition
Nothing inside of the OS should access the per-thread errno variable. Why?
- This is a improper separation of responsibilities.
errnois strictly a feature of the user-space vialibs/libc. The OS should perform basic operation, but user space operations underlibs/libcshould be the only place where the errno is maniuplated.
Linux and GLIBC have this same separation of functionality: GLIBC manages the errno variable, not Linux
- Modifying the
errnois dangerous within the OS. Theerrnois kept in TLS which resides on the thread's stack. The OS internals, however, may operate on different kernel threads or even, as Linux does, perform all system calls on a different stack. So, whenever a snippet of code within the OS modifies theerrno, it is assuming implicitly that caller's stack is in effect. But that might not always be the case.
In the KERNEL mode of operation, the user stack may not be accessible unless the correct address environment is instantiated, making modification of the user stack more complex, more risky.
So it would be a positive improvement to the OS if all errno operations were removed from the OS and, instead, performed in user space by logic in libs/libc
Front End Functions
Within the OS, special measures have been taken so that the OS does not modify the errno in most cases. This is done with (1) a front end, user OS interfaces that modify the errno and (2) a separate internal version of the OS interface that does not modify the errno. This latter interface is the one used exclusively within the OS.
The general pattern is like for the following for some_os_interface():
int some_os_interface(int some_parameters)
{
int ret = nx_some_os_interface(some_parameters);
if (ret < 0)
{
set_errno(-ret);
ret = ERROR;
}
return ret;
}
Where nx_some_os_interface() is the internal implementation of the OS interface that does NOT set the errno variable.
This general pattern could be easily moved to user space in /libs/libc. Only a few things would have to change:
- The implementation of
some_os_interface()would have to be separated and moved to a new file under/libs/libc. - In
syscall/*and ininclude/sys/*, all references tosome_os_interface()would have to be changed tonx_some_os_interface().
Easy, right? Only two things make this difficult: (1) there are many such tiny functions that would have to be changed, and (2) cancellation points make things more complex.
Cancellation Points
If some_os_interface() is a cancellation point, then there is more in the front end function. Such a front end function would look more like:
int some_os_interface(int some_parameters)
{
int ret;
enter_cancellation_point();
ret = nx_some_os_interface(some_parameters);
if (ret < 0)
{
set_errno(-ret);
ret = ERROR;
}
leave_cancellation_point();
return ret;
}
And enter/leave_cancellation_point() are OS functions that cannot be called from user-space logic in libs/libc. In this case, some_os_interface() would have to be separated into two functions:
int some_cancellable_os_interface(int some_parameters)
{
int ret;
enter_cancellation_point();
ret = nx_some_os_interface(some_parameters);
leave_cancellation_point();
return ret;
}
And
int some_os_interface(int some_parameters)
{
int ret;
enter_cancellation_point();
ret = some_cancellable_os_interface(some_parameters);
if (ret < 0)
{
set_errno(-ret);
ret = ERROR;
}
leave_cancellation_point();
return ret;
}
- The implementation of
some_os_interface()would have to be separated into two functions and the one that modifies theerrnowould be moved to a new file under/libs/libc. - In
syscall/*and in include/sys/*, all references tosome_os_interface ()would have to be changed tosome_cancelable_os_interface().
Signal Handlers
I have not done any detailed analysis, but the other place where there are many modifications to errno value is in implementations of the function up_sigdeliver(). In that function, it preserves the old errno value, calls the signal handler, then restores the old errno value. This preserves the errno value so that it is not lost due to signal handling logic.
In this case, the signal handler should always be running on the same stack as the thread that it interrupted so errno access should be safe.
However, this logic could simply be removed: It is not the responsibility of the OS signal handling logic to preserve the errno value; it is the responsibility of the signal handler to save and restore the errno value. Per GLIBC documentation under "1.2.2.1 POSIX Safety Concepts" (https://www.gnu.org/software/libc/manual/html_node/POSIX-Safety-Concepts.html):
"AS-Safe or Async-Signal-Safe functions are safe to call from asynchronous signal handlers. AS, in AS-Safe, stands for Asynchronous Signal.
"Many functions that are AS-Safe may set errno, or modify the floating-point environment, because their doing so does not make them unsuitable for use in signal handlers. However, programs could misbehave should asynchronous signal handlers modify this thread-local state, and the signal handling machinery cannot be counted on to preserve it. Therefore, signal handlers that call functions that may set errno or modify the floating-point environment must save their original values, and restore them before returning."
The general pattern is like for the following for
some_os_interface():int some_os_interface(int some_parameters) { int ret = nx_some_os_interface(some_parameters); if (ret < 0) { set_errno(-ret); ret = ERROR; } return ret; }
Why we can't extend mksyscall to insert set_errno inside PROXY_some_os_interface?
And
enter/leave_cancellation_point()are OS functions that cannot be called from user-space logic inlibs/libc. In this case,some_os_interface()would have to be separated into two functions:int some_cancellable_os_interface(int some_parameters) { int ret; enter_cancellation_point(); ret = nx_some_os_interface(some_parameters); leave_cancellation_point(); return ret; }And
int some_os_interface(int some_parameters) { int ret; enter_cancellation_point(); ret = some_cancellable_os_interface(some_parameters); if (ret < 0) { set_errno(-ret); ret = ERROR; } leave_cancellation_point(); return ret; }
Why we can't extend mksyscall to insert [enter|levea]_cancellation_point inside STUB_some_cancellable_os_interface?
With the right tool, we don't need write the similar code again and again, all can be done automaitcally.
- The implementation of
some_os_interface()would have to be separated into two functions and the one that modifies theerrnowould be moved to a new file under/libs/libc.- In
syscall/*and in include/sys/*, all references tosome_os_interface ()would have to be changed tosome_cancelable_os_interface().Signal Handlers
I have not done any detailed analysis, but the other place where there are many modifications to errno value is in implementations of the function
up_sigdeliver(). In that function, it preserves the olderrnovalue, calls the signal handler, then restores the olderrnovalue. This preserves theerrnovalue so that it is not lost due to signal handling logic.
We need consider another common case too: Many function inside OS also call libc function which isn't implemented by kernel but also modify errno. For example:
static int imxrt_rdtime(FAR struct rtc_lowerhalf_s *lower,
FAR struct rtc_time *rtctime)
{
time_t timer;
/* The resolution of time is only 1 second */
timer = up_rtc_time();
/* Convert the one second epoch time to a struct tm */
if (gmtime_r(&timer, (FAR struct tm *)rtctime) == 0)
{
int errcode = get_errno();
DEBUGASSERT(errcode > 0);
rtcerr("ERROR: gmtime_r failed: %d\n", errcode);
return -errcode;
}
return OK;
}
Those are VERY good ideas! Thank you. The is the perfect solution for PROTECTED and KERNEL builds.
FLAT would be a little additional work. We would have to generate proxies and stubs in the FLAT configuration too. There would be no software interrupt in that case, but a direct call from the proxy to the stub. So they would not be the same proxies; they would not call the the syscall_callXX function.
The messy marshalling and type conversion would add a little overhead. We would always have to insert the system call number even if it is not used, for example. Perhaps the special FLAT version of the stubs and proxies would omit the syscall number from the parameter list. Then the call to from the proxy to the stub could be well optimised.
Certainly worth some additional thought.
Yes, for FLAT build can pass a special flag to mksyscall, let PROXY_xxx call STUB_xxx directly without any syscall NR.
We can even have the same function name for both kernel and userspace(FLAT build need some clear method, but it's doable), then the wrong usage(e.g. call sem_post instead nxsem_post inside kernel) can be eliminated automatically.
I'm not sure if https://github.com/apache/incubator-nuttx/pull/6433 is contradicting to what is proposed here. Maybe we can discuss what is the best solution.
I expect that this proposal assumes that POSIX APIs are a system calls, however such approach introduce the implicit cancellation point support. For example the some of libc rely on sem_wait and that makes the users of sem_wait to become a cancellation points implicitly. That is not good. We need to have a system call layer that is agnostic to cancellation points and does not use errno. There are not much of such cases, so maybe having two variants of semaphore system call APIs may be a solution (like nxsem_wait and nxsem_wait_cancellable()) or making one more generic system call API in the middle like nxsem_wait_proxy(FAR sem_t *sem, int oflags); where oflags will be CANCELLABLE, BLOCK, etc., so nxsem_wait, nxsem_trywait, sem_wait and sem_trywait can be implemented calling a proxy. I'm not sure what is the best strategy.
For the most of the cases when API is not a cancellation point it is easy to make NX API as a system call and create a libc wrapper that will handle errno correctly. The real problem is cancellation point APIs (the APIs that pend). The cancellation point enter/leave should be executed n kernel mode, so I do not see any other option that creating a two variants of the same API that differ only with cancellation point management and expose both as system calls.
Again that contradicts assumption that only POSIX APIs are a system calls. Anyway we already have many NX APIs as system calls, so I think we can make whatever is convenient to be used.