bootloader
bootloader copied to clipboard
Decrease Kernel Stack Size
The bootloader currently sets up huge stack for the kernel by default (512 pages = 2MB). Now that the stack size is configurable, we should probably reduce the default.
cc @64
How many pages do you think is a good default? I can do a bit of investigation to see how much is being used.
I'm not sure honestly. According to https://www.kernel.org/doc/html/latest/x86/kernel-stacks.html, Linux only uses a 8KiB stack, so we can definitely reduce it considerably.
On the blog_os/post-09 branch, 5 pages is the minimum that the kernel can use without crashing (in release, this number is 3). So maybe 10 or 16 would be a reasonable number?
Sounds reasonable.
I did a quick experiment using the -Z emit-stack-size flag and the stack-sizes crate:
Debug:
| Address | Size | Function |
|---|---|---|
| 0x000000000000ad90 | 8280 | spin::once::Once<T>::call_once::h51bca0b20078dc27 |
| 0x0000000000022100 | 4200 | x86_64::structures::idt::InterruptDescriptorTable::new::hd32c5c9dadb04897 |
| 0x000000000001a830 | 4184 | core::ops::function::FnOnce::call_once::hee973ff0694526b7 |
| 0x000000000001aa40 | 2776 | <pc_keyboard::KeyCode as core::fmt::Debug>::fmt::hda7c4f2f9fb20e82 |
| 0x0000000000006ee0 | 1304 | blog_os::kernel_main::hc3a3a2c983130cd8 |
| 0x0000000000003960 | 832 | blog_os::interrupts::page_fault_handler::h204c58ca3e079357 |
| 0x0000000000013900 | 728 | core::fmt::Formatter::pad::h627264a917b3c4e5 |
Release:
| Address | Size | Function |
|---|---|---|
| 0x0000000000003de0 | 8248 | spin::once::Once<T>::call_once::h81ce430b91fc8bc5 |
| 0x00000000000069a0 | 3720 | x86_64::structures::idt::InterruptDescriptorTable::new::hff84b1851a4e5332 |
| 0x0000000000004040 | 272 | blog_os::interrupts::page_fault_handler::hca19ad914c532427 |
| 0x0000000000002970 | 264 | blog_os::kernel_main::h19c30b4a6b5987ca |
So the Once type and the creation of the IDT is responsible for most of the stack use.
I tried to initialize the IDT without lazy static. Instead I used a Mutex protected static:
static IDT: spin::Mutex<InterruptDescriptorTable> = spin::Mutex::new(InterruptDescriptorTable::new());
This way, the large IDT is initialized at compile time and does not need to be constructed on the stack. To be able to load it, I needed to add a custom lock_leak function to the Mutex type, which locks the mutex indefinitely and returns a &'static mut reference.
With this approach, the stack sizes are much smaller:
Debug:
| Address | Size | Function |
|---|---|---|
| 0x0000000000014210 | 2776 | <pc_keyboard::KeyCode as core::fmt::Debug>::fmt::hda7c4f2f9fb20e82 |
| 0x0000000000008280 | 1416 | blog_os::kernel_main::h7f109f414e18afd3 |
| 0x000000000000cc00 | 896 | blog_os::interrupts::page_fault_handler::hf82cf137aee3c1b6 |
| 0x0000000000005400 | 728 | alloc::raw_vec::RawVec<T |
| 0x000000000000a880 | 568 | <core::slice::Iter<T> as core::iter::traits::iterator::Iterator>::try_fold::ha720e2a3fd963112 |
Release:
| Address | Size | Function |
|---|---|---|
| 0x0000000000004130 | 272 | blog_os::interrupts::page_fault_handler::h98b3318d6c4ff705 |
| 0x0000000000002960 | 264 | blog_os::kernel_main::h44cfb55497ed624e |
| 0x0000000000003bc0 | 248 | spin::once::Once<T>::call_once::h6653ebe0f1309568 |
| 0x00000000000043a0 | 224 | blog_os::interrupts::double_fault_handler::hef2e5697b10cc94b |
| 0x0000000000004040 | 216 | blog_os::interrupts::breakpoint_handler::h8a0b1ee731e065af |
| 0x0000000000003cb0 | 200 | spin::once::Once<T>::call_once::hd8582ddc822ac52c |
| 0x0000000000006e40 | 200 | x86_64::registers::control::x86_64::<impl x86_64::registers::control::Cr3>::read::h825613083c74e7df |
I managed to run the release binary with kernel-stack-size = 1.
That’s excellent! I saw that crate too, but it seemed like it was going to be annoying to set up. Did you have to modify the linker script at all to get it to work? It would be good to have some instructions somewhere on how to do this; I’m happy to write something up.
It was definitely a bit annoying to set up. I also had some problems with using the cargo stack-sizes command directly, I assume because of cargo-xbuild.
The steps that worked for me were:
-
Create a linker script to preserve the
.stack_sizesELF section:/* file: keep-stack-sizes.x */ SECTIONS { /* `INFO` makes the section not allocatable so it won't be loaded into memory */ .stack_sizes (INFO) : { KEEP(*(.stack_sizes)); } } -
Run
RUSTFLAGS="--sysroot /path/to/your/project/target/sysroot -Zemit-stack-sizes" cargo rustc --bin blog_os -- -C link-arg=-Tkeep-stack-sizes.x -C link-arg=-Nto create an ELF file with a.stack_sizessection- Requires that you do a normal
cargo xbuildbefore to create the sysroot.
- Requires that you do a normal
-
Install the
stack-sizescrate:cargo install stack-sizes -
Run
stack-sizes target/x86_64-blog_os/debug/blog_os > stack-sizes.csv -
Sort the CSV file by size column. I used LibreOffice Calc for this, but any CSV tool should work.
-
Repeat the steps with
--releaseandtarget/x86_64-blog_os/release/blog_osif you like.
@phil-opp Hey, I know this is way above my paygrade currently, but would you be so kind to elaborate on the lock_leak function you talked about earlier? The stack usage difference it causes is pretty big, so it'd be cool to talk about it in another part regarding interrupts on the os blog, if that'd be at all possible.
Thanks :)
@L3tum Sorry for the late reply! I definitely plan to implement some solution to this problem for the blog, I'm just not sure about the best approach yet.
Regarding the lock_leak function: To load the IDT, we need a 'static reference to it. The normal lock function, however, only returns a reference that lives as long as the static variable is borrowed, i.e. only as long as the init_idt function runs. To get a 'static reference from the Mutex, we need an additional function that locks the lock indefinitely and returns a 'static reference (which I called lock_leak above).
Note that this is only one approach for solving this problem. Alternatively, we could try to make the set_handler function a const function, so that we could directly initialize it at compile time. If that's not possible, it might be preferable to create a new type instead of adding the lock_leak function in order to prevent accidental deadlocks. For example, we could create a type that hands out a single &'static mut reference and panics on subsequent calls.