sandals
sandals copied to clipboard
A lightweight process isolation tool, requiring absolutely no privileges to run
sandals
A lightweight process isolation tool for Linux. It is built using Linux namespaces, cgroups v2, and seccomp-bpf syscall filters.
$ echo '{"cmd":["ps","-A"],
"pipes":[{"dest":"/dev/stdout","stdout":true}],
"mounts":[{"type":"proc","dest":"/proc"}]
}' | sandals
PID TTY TIME CMD
1 ? 00:00:00 sandals
2 ? 00:00:00 ps
{"status":"exited","code":0}
Highlights
- speaks JSON — A departure from long-standing UNIX tradition,
sandalsreads JSON request fromSTDINand writes JSON response toSTDOUT, instead of relying on program arguments and exit code. - detailed status — Response tells whether the task exited normally or was killed or terminated due to compromised integrity (see below).
- task integrity — Sandbox introduces new modes of failure, e.g. disk full error due to to filesystem quota in effect; e.g. a subprocess terminated due to exceeded memory limit. A task integrity is compromised; it is unlikely to recover and it might produce bogus results if not prepared to handle an unusual failure. The compromised task is terminated and detailed status tells the reason and gives a hint if certain limits should be increased.
- rootless — No privileges required. No suid binaries involved.
- as fast as possible — Wrapping a task in
docker runreportedly adds 300ms overhead. Sandals reduces that to a mere 5ms, a boon for short-lived tasks! - exposes Linux kernel features instead of inventing higher-level abstractions — Primarily a side effect of being fast and lean. We aren't Docker, the user should be sufficiently versed in namespaces, cgroups, seccomp and mounts.
Other mature lightweight sandboxes are readily available: isolate (est. 2013), firejail (est. 2014), nsjail (est. 2015) to name a few. They are similarly low-level and comparably fast (though we managed to innovatively squeeze almost 20ms by doing namespaced process shutdown asynchronously). Our other features are unparalleled though.
Most notably, existing solutions require elevated privileges in order to set up a constrained sandbox. This is an inherent limitation of cgroups v1. Sandals, on the other hand, takes advantage of cgroups v2.
Installation
$ make && make install
User guide
(This guide is not meant to be exhaustive, check Reference for further details.)
Sandals runs untrusted code in a private set of Linux namespaces. Namespaces partition system objects. E.g. IPC namespaces partition IPC objects. Sandals creates the full set of private namespaces by default. Sandboxed code is unable to interact with host processes, either directly via signals or indirectly via IPC or sockets. It has no access to the network either.
Restricting access to the filesystem
TODO
Limiting resource usage with cgroups
TODO
Reference
Response
JSON object with status key. Depending on the status aditional keys might be present:
- exited: exited normally
- code: process exit code
- killed: killed by signal
- signal: killing signal (string, e.g.
SIGTERM)
- signal: killing signal (string, e.g.
- memoryLimit: memory limit as set by cgroup's
memory.maxexceeded - pidsLimit: pids limit as set by cgroup's
pids.maxexceeded - timeLimit: run time limit exceeded, see
timeLimitin Request section - outputLimit: task output size limit exceeded, see
pipes,copyFilesandstdStreamsin Request section - requestInvalid: request JSON invalid
- description: error description
- internalError: internal error
- description: error description
Request
JSON object with the mandatory cmd key.
Ex: {"cmd":["uname","-a"]}.
Optional keys:
-
hostName: string
Host name as observed inside sandbox. Default:
sandals. -
domainName: string
Domain name as observed inside sandbox. Default:
sandals. -
uid: number
User id as observed inside sandbox. Default:
0. -
gid: number
Group id as observed inside sandbox. Default:
0. -
chroot: string
A path to use as filesystem root inside sandbox. Default:
/. -
mounts: object []
A list of mounts to apply to filesystem view inside sandbox. Default:
[].Every mount has the mandatory
typekey. Type is eitherbindfor a bind mount or a value recognized bymountsystem call, liketmpfsorproc.The mandatory
destkey tells the destination of the mount. Chroot prefix (if any) is automatically prepended to the given path.Bind mounts must specify the source path with
srckey.Optional
optionsstring (empty by default) is passed tomountsyscall verbatim.Optional
roboolean key (default:false) turns on read-only mode for the mount. -
cgroupConfig: object
Enable resource limiting with cgroups. Default: cgroups not used.
Ex:
{"memory.max": "1000000", "memory.swap.max": "1000000"}. For each key/value pair, thevalueis written to the file named bykeyin the cgroup directory. -
cgroup, cgroupRoot: string
If
cgroupConfigis present, sandboxed task is put into a separate cgroup. This is either an existing cgroup as specified bycgroupkey or a new one if the later key is absent.A new cgroup is created under
cgroupRootif present, otherwize a new cgroup is spawned as a sibling of the current cgroup.If a new cgroup was created it is removed when sandals exits.
-
seccompPolicy: string
A syscall filtering policy in Kafel syntax. Filtering disabled by default.
-
vaRandomize: boolean
ASLR, enabled by default.
-
env: string []
Task's environment as a list of
KEY=VALUEstrings. Empty by default. -
workDir: string
Task's working directory. Default:
/.Chroot prefix (if any) is automatically prepended to the given path.
-
timeLimit: number
A time limit in seconds. No limit by default.
-
pipes: object []
A list of unidirectional channels for streaming data out of the sandbox.
Mandatory
destkey names the destination file to write the data.At least one of
stdout,stderrorsrckeys must be present. Ifstdoutkey is set totruethe pipe is attached as a task's standard output. Setstderrtotrueto attach the standard error stream. The last option is to expose the pipe ingress as a fifo in the filesystem. Chroot prefix (if any) is automatically prepended to the path insrckey.Optional
limitnumeric key caps the maximum amount of collected data (no limit by default). If the limit is exeeded, the task terminates withstatus:outputLimit. -
copyFiles: object []
A list of files to copy out of the sandbox once it terminates.
Subkeys are the same as in pipe object, see
pipes. -
stdStreams: object
Subkeys (same meaning as in pipe object, see
pipes):-
dest
-
limit
Capture both stdout and stderr simultaneously. Use case: present task's output exactly as it would appear if invoked in a terminal. Style stderr content differently (eg: red color).
Every chunk of data is prefixed with 32-bit integer in the network byte order. Bits 0..30 encode the chunk length. Bit 31 tells whether the chunk origin is stdout (0) or stderr (1).
Note: if size limit is exceeded the last packet (header + chunk) might be cut short.
Note: capturing task's output exactly as it would appear if invoked in a terminal, while distinguishing stdout/stderr content, is tricky. In most programming languages, if a stream is writing to a terminal, a different buffering strategy is used. If the same pipe is attached to stdout and stderr, it won't be possible to tell which stream the particular octet originated from. If two pipes are used, preserving relative ordering of chunks is challenging.
The current implementation utilizes UNIX datagram sockets, which solve the ordering problem nicely. The caveat is that any write to stdout/sdterr exceeding the maximum datagram size (~200KiB) will fail. Also streams buffering doesn't match.
If the perfect solution is desired, we suggest installing a custom Linux kernel module (available separately). Sandals will use the kernel module if present.
-