serenity
serenity copied to clipboard
Userland: Add BuggieBox program
As a preparation to get back into #14936, I decided we need to have a leaner layout of the upcoming initramfs image feature. To accomplish this, we need a small binary that is self contained and can do pretty much anything you expect it to do during boot - to mount filesystems, remove directories, list files, create nodes in /dev, print contents of a file, etc. I invented the name with correlation to the famous BusyBox utility which is awesome (and is used by many embedded Linux distributions), so I want to mimic the same functionality, but in the Serenity fashion like we all know and love :)
Now, I know static binaries are quite bad in general, as they tend to rot and you can't really expect to take a 5 years old static binary and try to run it without expecting it to fail due to changed API/ABI between userspace and the kernel. However, we are not Linux, and we don't care about not breaking ABI and userspace as long as people can just re-compile and then things work again. This is an exception to the rule, because in an initramfs environment, we should not expect people to copy over to the image a bunch of libraries just to make sure they can boot - ideally they should place this binary and a config file and that's it - 2 files for proper boot sequence.
In that fashion, what I imagine to happen soon is that the BuggieBox to become a toolbox of many known utilities so we can use it as swiss-knife like BusyBox, but I'd also expect it to have a small bootstrapping code to do init stuff as well, so if it is invoked with the right parameter, it will turn itself into a fash init shim that will invoke SystemServer to continue boot.
For now, it just compiles statically and says "Hello SerenityOS Box" in the kernel debug log. It is expected that I will add some more functionality later on for this, but I rather put it now on the display shelf to get feedback and see where it goes then.
cc @timschumi @ADKaster
Would BuggieBox be a cool name for this?
Would BuggieBox be a cool name for this?
I think you are right, thanks for the suggestion!
@supercomputer7 please undraft the PR if you consider it ready for review :)
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions!
This pull request has been closed because it has not had recent activity. Feel free to re-open if you wish to still contribute these changes. Thank you for your contributions!
Let's try this again :)
I intend to ensure this utility is good for both init environments and for jailed environments too.
The idea for jailed environment is that you don't need to copy all of /bin contents into that environment, only the BuggieBox binary, which contains a small set of utilities to ensure small attack surface in such environment.
Taking a completely different approach to this I decided to at first to not try to create a static binary. So we start with a dynamically-linked binary with many known utilities being contained within it.
In the next PRs, I will try to make this static and I will more CLI utilities into the binary so it could be suitable for being in an initramfs archive and in jail environment. Another idea I had was to create a mechanism that can include/remove utilities in compile time, which could be very helpful in case anybody wants to harden a jail environment, or just don't think a particular utility is necessary in the initramfs.
Oh, and I almost forgot, we still don't have the ability to use this in boot time in the sense that it's not like BusyBox that can do a sequence of things during boot. This feature will be added closely to the date I return to work on the initramfs patch.
Is there a chance that we can skip linking against LibUtil for the on-system tools and just #include <Userland/Libraries/LibUtil/tool.cpp> instead? Loading the entire LibUtil just to run one tool when we don't actually need the advantage of "everything is one static file" seems especially wasteful.
Also, I'm not entirely sure about moving away the source code for utilities from Userland/Utilities in general. Can't we just do something like the following (cat being an example here)?
Userland/Utilities/cat.cpp:
// <tool source>
ErrorOr<int> cat_main(Main::Arguments arguments)
{
// cool argument handling
}
#ifndef EXCLUDE_SERENITY_MAIN
ErrorOr<int> serenity_main(Main::Arguments arguments)
{
return cat_main(arguments);
}
#endif
BuggieBox would then just specify all the tools that it wants to include as a source file directly, set that flag to exclude the serenity_main inside each tool, and itself only ship a custom serenity_main that redirects based on argv[0], skipping LibUtil entirely.
Since that keeps the patch diff to a minimum, and we don't have to move around tools in parts or completely into an arbitrary library, that would generally seem like a superior solution to me.
Is there a chance that we can skip linking against
LibUtilfor the on-system tools and just#include <Userland/Libraries/LibUtil/tool.cpp>instead? Loading the entire LibUtil just to run one tool when we don't actually need the advantage of "everything is one static file" seems especially wasteful.Also, I'm not entirely sure about moving away the source code for utilities from
Userland/Utilitiesin general. Can't we just do something like the following (catbeing an example here)?
Userland/Utilities/cat.cpp:// <tool source> ErrorOr<int> cat_main(Main::Arguments arguments) { // cool argument handling } #ifndef EXCLUDE_SERENITY_MAIN ErrorOr<int> serenity_main(Main::Arguments arguments) { return cat_main(arguments); } #endif
BuggieBoxwould then just specify all the tools that it wants to include as a source file directly, set that flag to exclude theserenity_maininside each tool, and itself only ship a customserenity_mainthat redirects based onargv[0], skippingLibUtilentirely.Since that keeps the patch diff to a minimum, and we don't have to move around tools in parts or completely into an arbitrary library, that would generally seem like a superior solution to me.
Well, if I don't link the userland utilities against the library then compilation won't work. Also, how on earth #include <Userland/Libraries/LibUtil/tool.cpp> is supposed to work? Even if it works, trying to include a cpp file is an horrible thing and we should not do that.
As for the second suggestion, I thought about it, but it looks like an utter mess - sharing code in a library seems to me like the proper way to do that, and using #ifdef soup just to ensure we don't move the code into a shared library just seems wrong, both for readability and taste reasons. I rather not go that path, really. It doesn't make sense at all to me.
Is there a chance that we can skip linking against
LibUtilfor the on-system tools and just#include <Userland/Libraries/LibUtil/tool.cpp>instead? Loading the entire LibUtil just to run one tool when we don't actually need the advantage of "everything is one static file" seems especially wasteful.Also, I'm not entirely sure about moving away the source code for utilities from
Userland/Utilitiesin general. Can't we just do something like the following (catbeing an example here)?
Userland/Utilities/cat.cpp:// <tool source> ErrorOr<int> cat_main(Main::Arguments arguments) { // cool argument handling } #ifndef EXCLUDE_SERENITY_MAIN ErrorOr<int> serenity_main(Main::Arguments arguments) { return cat_main(arguments); } #endif
BuggieBoxwould then just specify all the tools that it wants to include as a source file directly, set that flag to exclude theserenity_maininside each tool, and itself only ship a customserenity_mainthat redirects based onargv[0], skippingLibUtilentirely.Since that keeps the patch diff to a minimum, and we don't have to move around tools in parts or completely into an arbitrary library, that would generally seem like a superior solution to me.
Also, in what sense is it wasteful to link on-system tools to a shared library? is it in the sense of loading time? or is it in storage space? please elaborate what you mean by that. I am not entirely convinced what is the actual wastefulness of my approach and unless there's an indication (and by that I mean - a proper diagnosis backed by some tests and metrics), I can't just agree with this statement or any related argument about this possible "problem".
To be clear, I don't intend the LibUtils library to contain all userland programs... it will only contain code for common CLI utilities that are shared with the BuggieBox binary. The amount of code being shared compared to the negligible increase in load time and related "problems" to that thing, gives some support to my approach over other options. Let's not forget that CLI utilities are not the main userland programs in the SerenityOS project, and we use them when we need them sparingly for most use cases.
Well, if I don't link the userland utilities against the library then compilation won't work.
That's why we would be "copying in" the cpp files directly.
Also, how on earth
#include <Userland/Libraries/LibUtil/tool.cpp>is supposed to work?
The same way that it works for header files, just that the ending is different. There is really no technical difference between either, and #pragma once isn't really needed because those .cpp files don't include other .cpp files.
Even if it works, trying to include a
cppfile is an horrible thing and we should not do that.
I'd argue that botchering an essentially random selection of command line utilities into a library is much more horrible. But in any case, that is why I made the second suggestion, which would be my preferred solution anyways.
I thought about it, but it looks like an utter mess
I can see why one would think why it's a mess, but on the scale of messiness it ranks below "some utilities are here as a whole, others are just a shim that link against a large library and then delegate to the correct entrypoint".
sharing code in a library seems to me like the proper way to do that, and using
#ifdefsoup just to ensure we don't move the code into a shared library just seems wrong, both for readability and taste reasons.
I'd argue that a single #ifdef per (affected) utility does not classify as ifdef soup, especially if named properly.
Also, in what sense is it wasteful to link on-system tools to a shared library? is it in the sense of loading time? or is it in storage space?
Not the storage space as in "on-disk", that would almost surely be bigger using separate binaries (although, if we end up making BuggieBox static, that is a moot point anyways). I'm mainly concerned about memory usage and loading times (and others apparently are as well, considering that there are currently PRs open that shave off ~500kB of memory usage on account of DynamicLoader having a seperate memory allocator).
I am not entirely convinced what is the actual wastefulness of my approach and unless there's an indication (and by that I mean - a proper diagnosis backed by some tests and metrics), I can't just agree with this statement or any related argument about this possible "problem".
I don't currently have this branch checked out, so I don't have any actual metrics, but since I plan to make a working example of my suggestion anyways, I might as well try to get some results about that too.
But keep in mind that inefficiency is not my only argument (I'd say that it's only supporting), since I believe that this approach is architecturally flawed in general as well.
To be clear, I don't intend the LibUtils library to contain all userland programs... it will only contain code for common CLI utilities that are shared with the BuggieBox binary.
This is precisely one of the architectural problems that I'm getting at, both from a build system standpoint as well as accessibility to the developer.
PS: I just saw that you updated this PR with changes to make BuggieBox link against dependencies statically. I'd suggest :yaksteps:, in which we first get the architecture right (or at least in a way where most people can agree with it), and only start trying to make everything static afterwards (maybe even in a separate PR). I feel like this will turn into a big and unreviewable yaktangle otherwise.
My suggestion is something like the following: f64326ef64c3ecc685dab8118e5b1229a7686500. Generating the help text from the list of tools I have yet to figure out.
| Metric | sc7's standalone cat |
Tim's standalone cat |
|---|---|---|
| Virtual memory required during idle | 7.2 MiB | 4.6 MiB |
| Private memory required during idle | 334.0 KiB | 228.0 KiB |
1000 times cat --help |
20202 ms | 7792 ms |
Note that both of these have been tested at a point in time where the static changes aren't applied (not that that should matter for the standalone tools in the first place).
My suggestion is something like the following: f64326e. Generating the help text from the list of tools I have yet to figure out.
Note that both of these have been tested at a point in time where the static changes aren't applied (not that that should matter for the standalone tools in the first place).
Examining the metrics do put some light on the discussion, thank you for doing these! I had a brief look into the commit you wrote and it looks quite OK, but I will have to look into this more deeply to examine how to integrate this correctly with other libraries to ensure the output binary of BuggieBox is almost statically compiled.
I will have to look into this more deeply to examine how to integrate this correctly with other libraries to ensure the output binary of BuggieBox is almost statically compiled.
I assume that your follow-up commits for making things static would apply roughly the same, the only thing that really changed for BusyBox is that we don't rely on as many headers and that we pull the utility source files from the Utilities directory directly instead of having an intermediate library. We also somehow ended up with less dependencies (I think), but that may be an effect of everything being dynamic.
In any case, I'll also look into how to make this static (duplicating the library targets feels kind of "meh", but I don't have any better suggestions either). However, (I'm sure it gets old at this point), I wouldn't go beyond trying a few things until the dynamic version that we are basing this on has been looked at some more.
I will have to look into this more deeply to examine how to integrate this correctly with other libraries to ensure the output binary of BuggieBox is almost statically compiled.
I assume that your follow-up commits for making things static would apply roughly the same, the only thing that really changed for BusyBox is that we don't rely on as many headers and that we pull the utility source files from the Utilities directory directly instead of having an intermediate library. We also somehow ended up with less dependencies (I think), but that may be an effect of everything being dynamic.
In any case, I'll also look into how to make this static (duplicating the library targets feels kind of "meh", but I don't have any better suggestions either). However, (I'm sure it gets old at this point), I wouldn't go beyond trying a few things until the dynamic version that we are basing this on has been looked at some more.
I pulled your changes and now I try to apply my other commits as well. This looks very promising so far, and we should certainly do what you suggested - linking against source code in the Utilities directory seems to work well so there's no need for a library :)