bootloader
bootloader copied to clipboard
Decrease Kernel Stack Size
The bootloader currently sets up huge stack for the kernel by default (512 pages = 2MB). Now that the stack size is configurable, we should probably reduce the default.
cc @64
How many pages do you think is a good default? I can do a bit of investigation to see how much is being used.
I'm not sure honestly. According to https://www.kernel.org/doc/html/latest/x86/kernel-stacks.html, Linux only uses a 8KiB stack, so we can definitely reduce it considerably.
On the blog_os/post-09 branch, 5 pages is the minimum that the kernel can use without crashing (in release, this number is 3). So maybe 10 or 16 would be a reasonable number?
Sounds reasonable.
I did a quick experiment using the -Z emit-stack-size
flag and the stack-sizes
crate:
Debug:
Address | Size | Function |
---|---|---|
0x000000000000ad90 | 8280 | spin::once::Once<T>::call_once::h51bca0b20078dc27 |
0x0000000000022100 | 4200 | x86_64::structures::idt::InterruptDescriptorTable::new::hd32c5c9dadb04897 |
0x000000000001a830 | 4184 | core::ops::function::FnOnce::call_once::hee973ff0694526b7 |
0x000000000001aa40 | 2776 | <pc_keyboard::KeyCode as core::fmt::Debug>::fmt::hda7c4f2f9fb20e82 |
0x0000000000006ee0 | 1304 | blog_os::kernel_main::hc3a3a2c983130cd8 |
0x0000000000003960 | 832 | blog_os::interrupts::page_fault_handler::h204c58ca3e079357 |
0x0000000000013900 | 728 | core::fmt::Formatter::pad::h627264a917b3c4e5 |
Release:
Address | Size | Function |
---|---|---|
0x0000000000003de0 | 8248 | spin::once::Once<T>::call_once::h81ce430b91fc8bc5 |
0x00000000000069a0 | 3720 | x86_64::structures::idt::InterruptDescriptorTable::new::hff84b1851a4e5332 |
0x0000000000004040 | 272 | blog_os::interrupts::page_fault_handler::hca19ad914c532427 |
0x0000000000002970 | 264 | blog_os::kernel_main::h19c30b4a6b5987ca |
So the Once
type and the creation of the IDT is responsible for most of the stack use.
I tried to initialize the IDT without lazy static. Instead I used a Mutex protected static:
static IDT: spin::Mutex<InterruptDescriptorTable> = spin::Mutex::new(InterruptDescriptorTable::new());
This way, the large IDT is initialized at compile time and does not need to be constructed on the stack. To be able to load it, I needed to add a custom lock_leak
function to the Mutex
type, which locks the mutex indefinitely and returns a &'static mut
reference.
With this approach, the stack sizes are much smaller:
Debug:
Address | Size | Function |
---|---|---|
0x0000000000014210 | 2776 | <pc_keyboard::KeyCode as core::fmt::Debug>::fmt::hda7c4f2f9fb20e82 |
0x0000000000008280 | 1416 | blog_os::kernel_main::h7f109f414e18afd3 |
0x000000000000cc00 | 896 | blog_os::interrupts::page_fault_handler::hf82cf137aee3c1b6 |
0x0000000000005400 | 728 | alloc::raw_vec::RawVec<T |
0x000000000000a880 | 568 | <core::slice::Iter<T> as core::iter::traits::iterator::Iterator>::try_fold::ha720e2a3fd963112 |
Release:
Address | Size | Function |
---|---|---|
0x0000000000004130 | 272 | blog_os::interrupts::page_fault_handler::h98b3318d6c4ff705 |
0x0000000000002960 | 264 | blog_os::kernel_main::h44cfb55497ed624e |
0x0000000000003bc0 | 248 | spin::once::Once<T>::call_once::h6653ebe0f1309568 |
0x00000000000043a0 | 224 | blog_os::interrupts::double_fault_handler::hef2e5697b10cc94b |
0x0000000000004040 | 216 | blog_os::interrupts::breakpoint_handler::h8a0b1ee731e065af |
0x0000000000003cb0 | 200 | spin::once::Once<T>::call_once::hd8582ddc822ac52c |
0x0000000000006e40 | 200 | x86_64::registers::control::x86_64::<impl x86_64::registers::control::Cr3>::read::h825613083c74e7df |
I managed to run the release binary with kernel-stack-size = 1
.
That’s excellent! I saw that crate too, but it seemed like it was going to be annoying to set up. Did you have to modify the linker script at all to get it to work? It would be good to have some instructions somewhere on how to do this; I’m happy to write something up.
It was definitely a bit annoying to set up. I also had some problems with using the cargo stack-sizes
command directly, I assume because of cargo-xbuild
.
The steps that worked for me were:
-
Create a linker script to preserve the
.stack_sizes
ELF section:/* file: keep-stack-sizes.x */ SECTIONS { /* `INFO` makes the section not allocatable so it won't be loaded into memory */ .stack_sizes (INFO) : { KEEP(*(.stack_sizes)); } }
-
Run
RUSTFLAGS="--sysroot /path/to/your/project/target/sysroot -Zemit-stack-sizes" cargo rustc --bin blog_os -- -C link-arg=-Tkeep-stack-sizes.x -C link-arg=-N
to create an ELF file with a.stack_sizes
section- Requires that you do a normal
cargo xbuild
before to create the sysroot.
- Requires that you do a normal
-
Install the
stack-sizes
crate:cargo install stack-sizes
-
Run
stack-sizes target/x86_64-blog_os/debug/blog_os > stack-sizes.csv
-
Sort the CSV file by size column. I used LibreOffice Calc for this, but any CSV tool should work.
-
Repeat the steps with
--release
andtarget/x86_64-blog_os/release/blog_os
if you like.
@phil-opp Hey, I know this is way above my paygrade currently, but would you be so kind to elaborate on the lock_leak
function you talked about earlier? The stack usage difference it causes is pretty big, so it'd be cool to talk about it in another part regarding interrupts on the os blog, if that'd be at all possible.
Thanks :)
@L3tum Sorry for the late reply! I definitely plan to implement some solution to this problem for the blog, I'm just not sure about the best approach yet.
Regarding the lock_leak
function: To load the IDT, we need a 'static
reference to it. The normal lock
function, however, only returns a reference that lives as long as the static variable is borrowed, i.e. only as long as the init_idt
function runs. To get a 'static
reference from the Mutex, we need an additional function that locks the lock indefinitely and returns a 'static
reference (which I called lock_leak
above).
Note that this is only one approach for solving this problem. Alternatively, we could try to make the set_handler
function a const
function, so that we could directly initialize it at compile time. If that's not possible, it might be preferable to create a new type instead of adding the lock_leak
function in order to prevent accidental deadlocks. For example, we could create a type that hands out a single &'static mut
reference and panics on subsequent calls.