gdbstub icon indicating copy to clipboard operation
gdbstub copied to clipboard

Explore binary overhead in mixed Rust + C/C++ projects

Open mean00 opened this issue 3 years ago • 8 comments

Hi First of all thank you for this nice project. I was looking at it with the goal of embedding it on a small arm or riscv board and talk to it over usb/cdc.

  • Arm support : check
  • Riscv support : check
  • No_std : check
  • Simple to use API: check

So far it looked like a perfect match But then i tried to build the no_std example on top of my project baseline, that is a mix of C++ and rust. The base project is ~ 20 kB binary size.

It built find, except it consumed a metric ton of flash (i guess this is all relative :) ) . With a lazy setup i was around 600 kB, using all the tricks i now it went down to ~ 200 kB.

I've looked into the why, ~ 80 kB or so is gdbstub/_arch itself, ~ 120 kB is pulling dependencies from core, fmt,... In the example, calling gdb.incoming_data(&mut target, byte) is enough for the final executable to jump from 20k to 220 kB (binary size), as the code is no longer tagged as not used and is not removed.

So a couple of questions :

  • Is it the size expected for the no_std version ? or did i make something silly ?
  • Do you have any hint how to shrink it down ?

Thank you again

mean00 avatar Mar 02 '22 07:03 mean00

I'm excited to hear that you're taking a crack at using gdbstub in a resource-constrained environment! Thusfar, we've had folks validate that gdbstub works with no_std, but no one has really cared about the binary size footprint. I'm looking forward to the insights that come out of this project...

In any case, to answer your question: there are some potentially non-obvious tricks that you can use to trim the binary down even more:

  • when adding gdbstub{_arch} in your Cargo.toml, set default-features = false, and make sure that the track-pkt feature is off
  • as per the example_no_std, you can specify the log crate to be entirely disabled by setting log = { version = "0.4", features = ["release_max_level_off"] } in your Cargo.toml. trace logging takes up space even if it isn't enabled (and pulls in dependencies on Rust's unreasonably expensive built-in formatting infrastructure)
  • make sure you're using #[inline(always)] on IDET methods, but also, not using #[inline(never)] on your actual handler methods (which is something I've noticed folks tend to leave in when copying the example_no_std code). As a general point of advice, gdbstub relies pretty heavily on aggressive inlining to achieve reasonable binary sizes in release builds, so you may need to "play around" a bit with sprinkling #[inline(always)] around the project to get things nice and tight.

Let me know if that helps the situation somewhat.

Also, I'm not sure if you know about this, but cargo bloat is an excellent tool for figuring out what's taking up space in your final binary. Do check it out if you haven't already.

daniel5151 avatar Mar 02 '22 16:03 daniel5151

Hey, I was wondering if you ever had a chance to explore some of these ideas?

daniel5151 avatar Mar 18 '22 16:03 daniel5151

Hi Sorry i forgot to reply. So, unfortunately no. The (good) ideas you proposed were more or less the setup i was using. From what i saw , the generics are generating tons of code (relatively speaking, for me 200k is tons :) ). I'll have to use a bigger MCU with a lot more flash. I'm still planning to go for it, it just a slight delay.

Thank you

mean00 avatar Mar 18 '22 17:03 mean00

When you say "the generics", can you elaborate a bit? I'd be very interested to see the output of whatever tools you're using to gather your data. In addition, if it's at all possible, I would love it if you could share some kind of minimal repro of the behavior you're seeing (if not publicly, then privately) so that I could dig into what's going on myself.

Most of my claims regarding gdbstub's small-size are based on the notion that the optimizing compiler is smart enough to eliminate dead code of all the unused protocol extensions. In my experience, this has been the case on x86/64, but it might be the case that other platforms don't have the same kind of optimizations. If that's the case, I would be very interested in digging into that.

daniel5151 avatar Mar 18 '22 18:03 daniel5151

Hi Yes : https://github.com/mean00/lnGdb.git It is a mix of rust and c++. It 's probably better to wait a bit, it was a work in progress till the flash consumption was a big issue. I was targeting a GD32F303 which has ~ 200 kB of flash and ~ 40 kB of ram

Edit platformConfig.cmake to point to a valid arm cross compiler (arm-none-eabi-)

The project generates a .map file which can be analyzed to see what file/symbol is taking a lot of room Example :

gdbstub::stub::core_impl::monitor_cmd::_$LT$impl$u20$gdbstub..stub..core_impl..GdbStubImpl$LT$T$C$C$GT$$GT$::handle_monitor_cmd::h45fe3ef669a69925 468 gdbstub::protocol::commands::breakpoint::BasicBreakpoint::from_slice::h89b6aea9ef2c483b 478 core::fmt::builders::DebugStruct::field::h4ff98cd827b88c97 480 gdbstub::protocol::commands::vCont::ActionsBuf::iter::$u7b$$u7b$closure$u7d$$u7d$::h23307dba026bd422 496 _$LT$gdbstub..protocol..commands.._m_upcase..M$u20$as$u20$gdbstub..protocol..commands..ParseCommand$GT$::from_packet::hf4eb86cf9629345d 498 _$LT$gdbstub..protocol..commands..x_upcase..X$u20$as$u20$gdbstub..protocol..commands..ParseCommand$GT$::from_packet::h71358d7d5e7fcd83 498 gdbstub::stub::core_impl::resume::$LT$impl$u20$gdbstub..stub..core_impl..GdbStubImpl$LT$T$C$C$GT$$GT$::write_stop_common::ha13950f7549e7b57 506 $LT$gdbstub..protocol..common..thread_id..ThreadId$u20$as$u20$core..convert..TryFrom$LT$$RF$$u5b$u8$u5d$$GT$$GT$::try_from::h0899b4192942d85e 506 xTaskCreate 524 core::unicode::printable::check::h442633f5347a6734 560 gdbstub::protocol::common::hex::decode_bin_buf::hd54283ae8cec642f 568 gdbstub::protocol::commands::vCont::VContKind::from_bytes::ha894a70d0513ffee 574 $LT$gdbstub..protocol..commands..vFile_pwrite..vFilePwrite$u20$as$u20$gdbstub..protocol..commands..ParseCommand$GT$::from_packet::hd821eac2d360b690 580 gdbstub::stub::core_impl::catch_syscalls::$LT$impl$u20$gdbstub..stub..core_impl..GdbStubImpl$LT$T$C$C$GT$$GT$::handle_catch_syscalls::h37002b567c9d62be 606 gdbstub::stub::core_impl::memory_map::$LT$impl$u20$gdbstub..stub..core_impl..GdbStubImpl$LT$T$C$C$GT$$GT$::handle_memory_map::h5edeaf2d4d4fdb37 626 gdbstub::stub::core_impl::auxv::$LT$impl$u20$gdbstub..stub..core_impl..GdbStubImpl$LT$T$C$C$GT$$GT$::handle_auxv::h0409a62fef7b3c12 626 gdbstub::stub::core_impl::exec_file::$LT$impl$u20$gdbstub..stub..core_impl..GdbStubImpl$LT$T$C$C$GT$$GT$::handle_exec_file::h2d18e442307cd331 630 _$LT$gdbstub..protocol..commands.._vFile_open..vFileOpen$u20$as$u20$gdbstub..protocol..commands..ParseCommand$GT$::from_packet::h5fff8c07d1d1236c 668 _$LT$gdbstub..protocol..commands.._m..m$u20$as$u20$gdbstub..protocol..commands..ParseCommand$GT$::from_packet::heb9e9c82edc3b24b 678 _$LT$gdbstub_arch..arm..reg..arm_core..ArmCoreRegs$u20$as$u20$gdbstub..arch..Registers$GT$::gdb_deserialize::h699aa263db561e8a 692 _$LT$gdbstub..protocol..commands.._vFile_pread..vFilePread$u20$as$u20$gdbstub..protocol..commands..ParseCommand$GT$::from_packet::h866cfc5012c0c297 694 gdbstub::protocol::packet::Packet::from_buf::h13db2c27fc8b9043 702 _$LT$core..iter..adapters..flatten..FlattenCompat$LT$I$C$U$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::size_hint::hee6240d592c7819f 708 gdbstub::stub::GdbStub$LT$T$C$C$GT$::run_state_machine::h132217be75fd931c 722 _$LT$gdbstub_arch..arm..reg..arm_core..ArmCoreRegs$u20$as$u20$gdbstub..arch..Registers$GT$::gdb_serialize::h4ab9b793a0e466c8 736 gdbstub::protocol::packet::PacketBuf::new::h916aa25f9a

There is a lot of smallish functions that look like generics ans when summed up take a lot of space

mean00 avatar Mar 19 '22 18:03 mean00

Sure, I don't mind waiting a bit and circling back.

In the meantime though, can you confirm that the repo you linked is the most up-to-date version of the code? That repo contains a lot of code that doesn't follow those suggestions I mentioned earlier...

As for what might be happening here, a few things spring to mind:

  • since this is a cross-language project (where Cargo is being "driven" by an outside build system), you should double check and confirm that your Rust code is indeed getting compiled in release mode with optimizations on.
  • given that this is a cross-language (and cross toolchain!) compile, I wonder if LTO is failing to kick in?

daniel5151 avatar Mar 19 '22 20:03 daniel5151

Hi, My local copy contains changes that need cleanup and contains the suggestions Rust is build in dev mode BUT the flags are about the same as in release mode (else the code would be much much bigger)

For the last part, that's probably a big chunk of the issue. The final code is linked using the C++ linker which does not do LTO as it tends to remove a bit too much of the C++ code :), on useless things such as interrupts. Need to mark them as used.

That may be as simple as that. I'll have to make LTO work and see what happens Thank you

mean00 avatar Mar 20 '22 06:03 mean00

Well, let me know when you push those changes somewhere I can take a look at them myself, and then we can continue this exploration :)

daniel5151 avatar Mar 20 '22 17:03 daniel5151

Hi After a short break (...), i'm looking into this again. It seems switching the whole toolchain to llvm/clangs for both C++ and rust helps a lot. With empty shells for conn & friends , the cost of gdbstub is only ~ 10 kB which is what is expected (more or less).

(rust/llvm LTO does not seem to work with gcc LTO)

mean00 avatar Oct 01 '22 18:10 mean00

Ah, that's great to hear! That makes a lot of sense.

Should I close this issue, or do you think this is something we should document somewhere (e.g: in the README)?

daniel5151 avatar Oct 02 '22 11:10 daniel5151

Please close it. Indeed maybe a pointer in the doc would be helpful for others. Thank you.

mean00 avatar Oct 02 '22 15:10 mean00