solo5
solo5 copied to clipboard
reserved exit codes
from solo5.h: Status values of 255 and above are reserved for use by Solo5.
now, reading some more about exit codes, they seem to be modulo 256 usually -- should the comment instead be "status values of 255 are reserved for use by solo5"? or is it expected that further exit codes are required (and e.g. 250 and above are solo5-specific)?
The comment is a bit misleading. There are two notions of "exit code" involved:
- The
status
passed tosolo5_exit()
. This is just anint
. - The exit code returned by the tender (hvt) or process (spt) to the host.
- In the case of hvt, this is done by returning from
hvt_vcpu_loop()
, see here https://github.com/Solo5/solo5/blob/master/tenders/hvt/hvt_kvm_x86_64.c#L194 and here https://github.com/Solo5/solo5/blob/master/tenders/hvt/hvt_main.c#L278. - In the case of spt, this is done by the bindings here https://github.com/Solo5/solo5/blob/master/bindings/spt/platform.c#L46
- In the case of hvt, this is done by returning from
In both cases (2) above, the resulting code will be passed to exit(2)
, which effectively does status & 0xff
, so codes >= 256 will be "lost".
What the special code of 255
is supposed to do is tell the tender (in this case, only hvt) to trigger "other things to do in case of an abort", e.g. dump a guest core file if configured/compiled to do that.
What we should probably be doing instead is reserve codes >= 256 at the Solo5 layer (so as not to steal <= 255 from the application), and/or possibly add a separate ABORT hypercall.
In posix you distinguish between application-generated exits and signal (/kernel)-imposed exits, here is a stackoverflow post that goes over this with pedantic attention to error code semantics :) https://stackoverflow.com/questions/5149228/return-value-range-of-the-main-function
In Linux this done by packing the truncated error code (& 0xff
) like 0xAAbb
where bb
is the signal status (0x80
for coredumps) and AA
is the application exit code iff bb == 00
.
When you call waitpid()
or wait()
you get an integer, you're supposed to use macros like WIFEXITED() / WEXITSTATUS()
and WIFSIGNALED() / WIFTERMSIG
to tell what caused the exit, so that you can distinguish between e.g. SIGSEGV
and someone just returning with exit(14)
.
Where this gets confusing is that sh
doesn't expose this information, instead it sets the high bit when WIFSIGNALED
and return that to the user, so SIGABORT (exit code 6
) turns into a return code in $?
of 134
(134 - 128 = 6
), meaning you can't tell the difference between exit(134)
and SIGABORT
:
(man bash /EXIT STATUS) The return value of a simple command is its exit status, or 128+n if the command is terminated by signal n.
Furthermore sh
returns 126
and 127
if commands are respectively not executable, or not found.
So for practical cases, we probably want to limit tenders (and in turn strongly suggest to unikernels) to return exit codes in the range of 0 .. 125
to make it easier to work reliably with tenders from the shell.
We could reserve more of these for solo5-specific errors, like from 112 (0x70
) and up?
I like the ABORT
hypercall suggestion, that seems cleaner than relying on magic values of the exit code.