bootloader icon indicating copy to clipboard operation
bootloader copied to clipboard

Decrease Kernel Stack Size

Open phil-opp opened this issue 4 years ago • 9 comments

The bootloader currently sets up huge stack for the kernel by default (512 pages = 2MB). Now that the stack size is configurable, we should probably reduce the default.

cc @64

phil-opp avatar Sep 14 '19 07:09 phil-opp

How many pages do you think is a good default? I can do a bit of investigation to see how much is being used.

64 avatar Sep 14 '19 16:09 64

I'm not sure honestly. According to https://www.kernel.org/doc/html/latest/x86/kernel-stacks.html, Linux only uses a 8KiB stack, so we can definitely reduce it considerably.

phil-opp avatar Sep 14 '19 17:09 phil-opp

On the blog_os/post-09 branch, 5 pages is the minimum that the kernel can use without crashing (in release, this number is 3). So maybe 10 or 16 would be a reasonable number?

64 avatar Sep 17 '19 00:09 64

Sounds reasonable.

I did a quick experiment using the -Z emit-stack-size flag and the stack-sizes crate:

Debug:

Address Size Function
0x000000000000ad90 8280 spin::once::Once<T>::call_once::h51bca0b20078dc27
0x0000000000022100 4200 x86_64::structures::idt::InterruptDescriptorTable::new::hd32c5c9dadb04897
0x000000000001a830 4184 core::ops::function::FnOnce::call_once::hee973ff0694526b7
0x000000000001aa40 2776 <pc_keyboard::KeyCode as core::fmt::Debug>::fmt::hda7c4f2f9fb20e82
0x0000000000006ee0 1304 blog_os::kernel_main::hc3a3a2c983130cd8
0x0000000000003960 832 blog_os::interrupts::page_fault_handler::h204c58ca3e079357
0x0000000000013900 728 core::fmt::Formatter::pad::h627264a917b3c4e5

Release:

Address Size Function
0x0000000000003de0 8248 spin::once::Once<T>::call_once::h81ce430b91fc8bc5
0x00000000000069a0 3720 x86_64::structures::idt::InterruptDescriptorTable::new::hff84b1851a4e5332
0x0000000000004040 272 blog_os::interrupts::page_fault_handler::hca19ad914c532427
0x0000000000002970 264 blog_os::kernel_main::h19c30b4a6b5987ca

So the Once type and the creation of the IDT is responsible for most of the stack use.

phil-opp avatar Sep 18 '19 08:09 phil-opp

I tried to initialize the IDT without lazy static. Instead I used a Mutex protected static:

static IDT: spin::Mutex<InterruptDescriptorTable> = spin::Mutex::new(InterruptDescriptorTable::new());

This way, the large IDT is initialized at compile time and does not need to be constructed on the stack. To be able to load it, I needed to add a custom lock_leak function to the Mutex type, which locks the mutex indefinitely and returns a &'static mut reference.

With this approach, the stack sizes are much smaller:

Debug:

Address Size Function
0x0000000000014210 2776 <pc_keyboard::KeyCode as core::fmt::Debug>::fmt::hda7c4f2f9fb20e82
0x0000000000008280 1416 blog_os::kernel_main::h7f109f414e18afd3
0x000000000000cc00 896 blog_os::interrupts::page_fault_handler::hf82cf137aee3c1b6
0x0000000000005400 728 alloc::raw_vec::RawVec<T
0x000000000000a880 568 <core::slice::Iter<T> as core::iter::traits::iterator::Iterator>::try_fold::ha720e2a3fd963112

Release:

Address Size Function
0x0000000000004130 272 blog_os::interrupts::page_fault_handler::h98b3318d6c4ff705
0x0000000000002960 264 blog_os::kernel_main::h44cfb55497ed624e
0x0000000000003bc0 248 spin::once::Once<T>::call_once::h6653ebe0f1309568
0x00000000000043a0 224 blog_os::interrupts::double_fault_handler::hef2e5697b10cc94b
0x0000000000004040 216 blog_os::interrupts::breakpoint_handler::h8a0b1ee731e065af
0x0000000000003cb0 200 spin::once::Once<T>::call_once::hd8582ddc822ac52c
0x0000000000006e40 200 x86_64::registers::control::x86_64::<impl x86_64::registers::control::Cr3>::read::h825613083c74e7df

I managed to run the release binary with kernel-stack-size = 1.

phil-opp avatar Sep 18 '19 08:09 phil-opp

That’s excellent! I saw that crate too, but it seemed like it was going to be annoying to set up. Did you have to modify the linker script at all to get it to work? It would be good to have some instructions somewhere on how to do this; I’m happy to write something up.

64 avatar Sep 18 '19 14:09 64

It was definitely a bit annoying to set up. I also had some problems with using the cargo stack-sizes command directly, I assume because of cargo-xbuild.

The steps that worked for me were:

  • Create a linker script to preserve the .stack_sizes ELF section:

    /* file: keep-stack-sizes.x */
    SECTIONS
    {
      /* `INFO` makes the section not allocatable so it won't be loaded into memory */
      .stack_sizes (INFO) :
      {
        KEEP(*(.stack_sizes));
      }
    }
    
  • Run RUSTFLAGS="--sysroot /path/to/your/project/target/sysroot -Zemit-stack-sizes" cargo rustc --bin blog_os -- -C link-arg=-Tkeep-stack-sizes.x -C link-arg=-N to create an ELF file with a .stack_sizes section

    • Requires that you do a normal cargo xbuild before to create the sysroot.
  • Install the stack-sizes crate: cargo install stack-sizes

  • Run stack-sizes target/x86_64-blog_os/debug/blog_os > stack-sizes.csv

  • Sort the CSV file by size column. I used LibreOffice Calc for this, but any CSV tool should work.

  • Repeat the steps with --release and target/x86_64-blog_os/release/blog_os if you like.

phil-opp avatar Sep 18 '19 17:09 phil-opp

@phil-opp Hey, I know this is way above my paygrade currently, but would you be so kind to elaborate on the lock_leak function you talked about earlier? The stack usage difference it causes is pretty big, so it'd be cool to talk about it in another part regarding interrupts on the os blog, if that'd be at all possible.

Thanks :)

L3tum avatar Oct 20 '19 14:10 L3tum

@L3tum Sorry for the late reply! I definitely plan to implement some solution to this problem for the blog, I'm just not sure about the best approach yet.

Regarding the lock_leak function: To load the IDT, we need a 'static reference to it. The normal lock function, however, only returns a reference that lives as long as the static variable is borrowed, i.e. only as long as the init_idt function runs. To get a 'static reference from the Mutex, we need an additional function that locks the lock indefinitely and returns a 'static reference (which I called lock_leak above).

Note that this is only one approach for solving this problem. Alternatively, we could try to make the set_handler function a const function, so that we could directly initialize it at compile time. If that's not possible, it might be preferable to create a new type instead of adding the lock_leak function in order to prevent accidental deadlocks. For example, we could create a type that hands out a single &'static mut reference and panics on subsequent calls.

phil-opp avatar Nov 02 '19 22:11 phil-opp