--set-env incompatibile with environments that have multiline variables
Per the man page of /proc/[pid]/environ:
The entries are separated by null bytes ('\0'), and there may be a null byte at the end.
The code assumes that the environment variables are separated by \n and not \0:
https://github.com/hpc/charliecloud/blob/master/bin/ch-run.c#L294
Should be easy enough to swap getline() for getdelim() and that would allow using complex environments.
Newline-separated was done deliberately, though I don't recall the exact reasoning. One example is hand-editing the environment file; much harder to have zero-delimited records in a text editor.
You have a good point though.
Options include:
- Do nothing; environment variables with newline are not supported.
- Change
/ch/environmentto zero-separated records; no option for newline-separated. - Add a second option for zero-separated records, e.g.
--set-env-zero. - Some heuristic to select between zero-separated and newline-separated, e.g. if the file contains a zero byte it is zero-separated. This would mean a file with a single variable that contained newlines was not supported.
- that breaks anyone who wants a non-trivial environment
- possible
- this works as it allows both options and for hand-edited env when needed
- rather not risk a security surprise with this one IMHO
There isn't really a perfectly clean solution that I know of; lots of *nix utilities can read/generate either newline- or NUL-delimited data, and without exception (AFAIK), they use command line switches to differentiate (e.g., xargs -0, find -print0).
My recommendation would be one (or more) of 3 things:
- Add
--set-env0to allow NUL-delimited environment files; - Support the use of escaped newlines (i.e.,
\at EOL) to allow embedded newlines via line continuation; and/or - Parse the environment files in a more shell-like way that supports quoted newlines (as seen in the output of
declare -p IFS COMP_WORDBREAKS).
If you did want to go the heuristic route, though, it could be pretty straight-forward: If you encounter any NUL bytes, you can safely assume it's NUL-delimited because it will be anyway. NUL bytes cannot exist in environment variables, nor do shells support them. Only ZSH even allows NUL bytes its own shell variables at all, and even then they cannot pass intact to any subcommands or other shells.
Thus, from the user perspective, the rule is either
- Each environment variable ends in a newline (i.e.,
^J,LF, integer value 10/0x0a/\012), or - Each environment variable ends in a NUL byte (
^@, integer value 0).
With that being the rule, if you make it to EOF without having read in a NUL byte, you can safely assume the file is newline-delimited, and vice versa. Even a single-variable environment file with an embedded newline will work because the last character of the file would still need to be either a newline (NL-delimited), a NUL byte (NUL-delimited), or neither (technically invalid, but assuming NL-delimited is likely a safe default).
In the end, environment variables are always NUL-terminated; just as with argv[], envp[] as provided to each executable by the kernel at runtime is, always, a NULL-terminated list of NUL-terminated strings, so it's inherently safe to read, represent, and store them in that way. But will it match up with user expectations and the Principle of Least Surprise? That may be an entirely different matter....
I'm inclined to go the heuristic route, since (a) this avoids the need to increase UI complexity (no new option) and (b) the error case seems rather obscure (NUL-delimited, single variable only, that variable contains newlines, 2nd and subsequent lines in the value all look like assignments, incorrectly terminated via missing NUL terminator).
Thinking about this again, what are your thoughts on this:
Read in the entire file, and check the very last character (as we alluded to above); if the final character is either newline or a NUL byte, that's the delimiter, and if the last byte is neither 0x0a nor 0x00, just error out entirely.
Would that be reasonable? It's admittedly a bit more strict, but given Charliecloud's priority on security and safety, I think that's probably a good thing.
Read in the entire file, and check the very last character
I like that except proc(5) says /proc/*/environ “may” end with \0. Mine does but can we count on it?
Read in the entire file, and check the very last character I like that except
proc(5)says/proc/*/environ“may” end with\0. Mine does but can we count on it?
man 5 procfs is a little more ambiguous than I would prefer:
/proc/[pid]/environ This file contains the initial environment that was set when the currently executing program was started via execve(2). The entries are separated by null bytes ('\0'), and there may be a null byte at the end. Thus, to print out the environment of process 1, you would do: $ cat /proc/1/environ | tr '\000' '\n'
If I'm reading the kernel code correctly at:
https://github.com/torvalds/linux/blob/3bc1bc0b59d04e997db25b84babf459ca1cd80b7/fs/proc/base.c#L976-L979 The kernel will blindly copy over the memory region for environ up to the page count.
For ELF (and Flat) formatted binaries: it looks like the kernel will assemble a NULL-terminated sequential array of pointers on the stack but will not include that NULL pointer when populating the environ procfs file.
https://github.com/torvalds/linux/blob/a1901b464e7e3e28956ae7423db2847dbbfb5be8/fs/binfmt_elf.c#L355 https://github.com/torvalds/linux/blob/master/fs/binfmt_elf_fdpic.c#L718 https://github.com/torvalds/linux/blob/master/fs/binfmt_flat.c#L156
So if I understand the kernel code correctly, the environment will be populated with whatever the binary format has, which is loaded by the kernel in the .text section blindly. So short of looking for the ELF magic on the stack, I can't find a reliable way to externally tell if a loading process is ELF (or other).
Care to make a new arg that is NULL terminated instead?
Looks like Docker does poorly with newlines too (e.g. moby/moby#32140, moby/moby#28617, estafette/estafette-ci-builder#7).