solo5 icon indicating copy to clipboard operation
solo5 copied to clipboard

reserved exit codes

Open hannesm opened this issue 5 years ago • 2 comments

from solo5.h: Status values of 255 and above are reserved for use by Solo5.

now, reading some more about exit codes, they seem to be modulo 256 usually -- should the comment instead be "status values of 255 are reserved for use by solo5"? or is it expected that further exit codes are required (and e.g. 250 and above are solo5-specific)?

hannesm avatar Oct 13 '19 16:10 hannesm

The comment is a bit misleading. There are two notions of "exit code" involved:

  1. The status passed to solo5_exit(). This is just an int.
  2. The exit code returned by the tender (hvt) or process (spt) to the host.
    1. In the case of hvt, this is done by returning from hvt_vcpu_loop(), see here https://github.com/Solo5/solo5/blob/master/tenders/hvt/hvt_kvm_x86_64.c#L194 and here https://github.com/Solo5/solo5/blob/master/tenders/hvt/hvt_main.c#L278.
    2. In the case of spt, this is done by the bindings here https://github.com/Solo5/solo5/blob/master/bindings/spt/platform.c#L46

In both cases (2) above, the resulting code will be passed to exit(2), which effectively does status & 0xff, so codes >= 256 will be "lost".

What the special code of 255 is supposed to do is tell the tender (in this case, only hvt) to trigger "other things to do in case of an abort", e.g. dump a guest core file if configured/compiled to do that.

What we should probably be doing instead is reserve codes >= 256 at the Solo5 layer (so as not to steal <= 255 from the application), and/or possibly add a separate ABORT hypercall.

mato avatar Oct 14 '19 15:10 mato

In posix you distinguish between application-generated exits and signal (/kernel)-imposed exits, here is a stackoverflow post that goes over this with pedantic attention to error code semantics :) https://stackoverflow.com/questions/5149228/return-value-range-of-the-main-function In Linux this done by packing the truncated error code (& 0xff) like 0xAAbb where bb is the signal status (0x80 for coredumps) and AA is the application exit code iff bb == 00.

When you call waitpid() or wait() you get an integer, you're supposed to use macros like WIFEXITED() / WEXITSTATUS() and WIFSIGNALED() / WIFTERMSIG to tell what caused the exit, so that you can distinguish between e.g. SIGSEGV and someone just returning with exit(14).

Where this gets confusing is that sh doesn't expose this information, instead it sets the high bit when WIFSIGNALED and return that to the user, so SIGABORT (exit code 6) turns into a return code in $? of 134 (134 - 128 = 6), meaning you can't tell the difference between exit(134) and SIGABORT:

  (man bash /EXIT STATUS)
  The return value of a simple command is its exit status,  or  128+n  if
  the command is terminated by signal n.

Furthermore sh returns 126 and 127 if commands are respectively not executable, or not found. So for practical cases, we probably want to limit tenders (and in turn strongly suggest to unikernels) to return exit codes in the range of 0 .. 125 to make it easier to work reliably with tenders from the shell.

We could reserve more of these for solo5-specific errors, like from 112 (0x70) and up?

I like the ABORT hypercall suggestion, that seems cleaner than relying on magic values of the exit code.

cfcs avatar Oct 15 '19 13:10 cfcs