This is a 2 in 1 RFC and implementation PR. This builds on top of #90 so you can ignore the first commit. See #42 for some background information.

@rust-embedded/cortex-m Please review this RFC and if you are in favor approve this PR.

Summary

Add a #[ramfunc] attribute to this crate. This attribute can be applied to functions to place them in RAM.

Motivation

Running functions from RAM may result in faster execution times. Thus placing "hot code" in RAM may improve the performance of an application.

Design and rationale

Expansion

The #[ramfunc] attribute will expand into a #[link_section = ..] attribute that will place the function in a .ramfunc.$hash section. Each #[ramfunc] function will be placed in a different linker section (in a similar fashion to -ffunction-sections) to let the linker GC (Garbage Collect) the unused functions . To this end, #[ramfunc] won't accept generic functions because if the generic function has several instantiations all of them would end in the same linker section.

link.x will be tweaked to instruct the linker to collect all the input sections of the form .ramfunc.* into a single .ramfunc output section that will placed in RAM.

The Reset handler will be updated to initialize the .ramfunc section before the user entry point is called.

Why an attribute?

This is implemented as an attribute to make it composable with the attributes to be added in RFC #90. For example you can place an exception handler in RAM. The compiler won't generate a veener in this case as the address to the RAM location will be placed in the vector table.

#[exception(SysTick)]
#[ramfunc]
fn systick() {
    // do stuff
}

Why a separate linker section?

Using a separate section lets the compiler / linker assign the right ELF flags to .ramfunc (i.e. "AX"). That way .ramfunc ends up showing in the output of objdump -C.

$ readelf -S app
  [ 1] .vector_table     PROGBITS        08000000 010000 000400 00   A  0   0  4
  [ 2] .text             PROGBITS        08000400 010400 0003ea 00  AX  0   0  2
  [ 3] .rodata           PROGBITS        080007ec 020020 000000 00  WA  0   0  4
  [ 4] .data             PROGBITS        20000000 020000 000004 00  WA  0   0  4
  [ 5] .ramfunc          PROGBITS        20000004 020004 00001c 00  AX  0   0  4
  [ 6] .bss              NOBITS          20000020 000000 000000 00  WA  0   0  4

$ arm-none-eabi-objdump -Cd app
Disassembly of section .text:

08000400 <Reset>:
 8000400:       f000 f9ee       bl      80007e0 <DefaultPreInit>
 (..)

Disassembly of section .ramfunc:

20000004 <SysTick>:
20000004:       b580            push    {r7, lr}
(..)

This also means that .data would only contain ... data so the section can be inspected (not disassembled) separately.

$ arm-none-eabi-objdump -s -j .data app
Contents of section .data:
 20000000 01000000                             ....

The downside (?) is that the default format of size will report .ramfunc under text.

$ size app
   text    data     bss     dec     hex filename
   2066       4       4    2074     81a app

But in any case it's (usually) better to look at the output in System V format.

$ size -Ax app
app  :
section             size         addr
.vector_table      0x400    0x8000000
.text              0x3ea    0x8000400
.rodata              0x0    0x80007ec
.data                0x4   0x20000000
.ramfunc            0x28   0x20000004
.bss                 0x4   0x2000002c

Unresolved questions?

Is there a reliable way to initialize both .data and .ramfunc in a single for loop? (see init_data in the Reset function) Right now I initialize each one separately and this produces more code. I haven't tried initializing them in a single pass.
Should #[ramfunc] imply #[inline(never)]? Right now the compiler is free to inline functions marked as #[ramfunc] into other functions. That could cause the function to not end in RAM.
Bikeshed the attribute name and linker section name? IAR and TI use the name ramfunc for this functionality.

Aug 31 '18 00:08 japaric

Oh and I have not tested this on hardware. I have only looked at the output of objdump so it would be good if someone tested it to see if it actually works ...

Aug 31 '18 00:08 japaric

@japaric This is fantastic! Especially the ability to have exception/interrupt handlers without veneer.

Should #[ramfunc] imply #[inline(never)]? Right now the compiler is free to inline functions marked as #[ramfunc] into other functions. That could cause the function to not end in RAM.

Yes, we should do that.

Aug 31 '18 07:08 therealprof

Something that should be mentioned in the docs is that #[ramfunc] does not automatically propagate through the call stack. By this I mean that code like:

#[ramfunc]
fn foo() {
    // ..
    bar();
    // ..
}

fn bar() { .. }

Could result in foo being executed from RAM and bar being executed from Flash unless bar gets inlined into foo but that would be up to the compiler.

Sure you can add #[ramfunc] to bar if you wrote it but if you are calling third party code, like libm, you have pretty much no control over whether something like sinf will end in Flash or RAM.

Aug 31 '18 10:08 japaric

Also I agree with @therealprof, a ramfunc should not be inlined, that destroys the purpose. :) Writing RAM functions was a bit of an art before to make sure the entire call-tree is in RAM, though I do not believe we can auto promote other functions to RAM, and more importantly if we should.

Aug 31 '18 10:08 korken89

Sounds good to me. I'm quite used to this sort of thing from working with big-name RTOS from a vendor that rhymes with "Nentor" and the little pitfalls tend to be fairly well known by people who have a need for this.

Aug 31 '18 18:08 thejpster

Should #[ramfunc] imply #[inline(never)]?

:+1:

Bikeshed the attribute name and linker section name?

I'm fine with ramfunc.

To this end, #[ramfunc] won't accept generic functions because if the generic function has several instantiations all of them would end in the same linker section.

Is that actually a problem? Presumably if the instantiation was generated it's because it's in use, so wouldn't be eliminated by the linker later anyway, but I might be misunderstanding how the generic instantiations get made. In any event it seems like it might be more useful to be able to have ramfunc generics than not. Since calling a function from a ramfunc doesn't make the called function live in RAM unless it gets inlined, you can't even make your own type-specific instantiations which call the generic and be confident it will still be in RAM.

Otherwise this looks good. As @korken89 mentioned it would be really nice if we could specify what section ramfuncs end up in, but I think we also want that for statics, and for things like #59, and even being able to move stacks to CCM. The same consideration about being able to initialise in a single loop would apply to non-.data statics too.

Sep 03 '18 22:09 adamgreig

@korken89

Continuing thread https://github.com/rust-embedded/cortex-m-rt/pull/100#discussion_r214315147 here because review comments get hidden when new commits are pushed.

We could support placing functions in CCRAM by adding a #[ccramfunc] attribute but this AFAICT this would require all users to define a CCRAM.

An alternative is to extend the #[ramfunc] to take an integer argument instead of adding a new attribute. This could be generalized to support any number of RAM regions. That is, #[ramfunc(0)] would place a function in .ramfunc0, #[ramfunc(1)] would place a function in .ramfunc1 and so on. Then .ramfunc0 would go in region RAM0, .ramfunc1 in RAM1, and so on.

This can be further generalized to allow placing static mut variables in different RAM regions. At that point, it would make more sense to rename the attribute to just #[ram]. Then #[ram] fn foo() would put foo in .ramfunc0 (no argument is equivalent to passing 0 as the argument), and #[ram(1)] static mut FOO: u32 = 0 (*) would put FOO in .data1 (**). This opens the question of how many regions to support by default? Just 1? And the question of how to let users configure the number of regions? Cargo features (being booleans) seems like a bad fit for this kind of configuration.

(*) Using #[ram] with static variables is tricky and potentially error prone. A static variable could end in .rodata (e.g. a string literal) or in .data (if it contains something with interior mutability or an atomic). The attribute doesn't have enough information (just the item AST) to determine if a static should really go in RAM so a misuse could cause a string literal to be stored in RAM, which would be wasteful.

(**) Similarly the attribute doesn't have enough information to determine if something should go in .bss or .data so putting everything in .data is the safe and conservative choice although it can unnecessarily increase the initialization time.

On the one hand, support for multiple memory regions looks like a breaking change so it would be good to batch it with breaking addition of attributes. On the other hand, I don't have the bandwidth to sit down and design something reasonable at the moment so I would prefer to postpone it for v0.7.0.

Disclaimer: I never had the need to call stuff from RAM so take my comment with a grain of salt.

I'm wondering if having #[ramfunc] imply #[inline(never)] is more error prone or at least less flexible than leaving #[inline(never)] as a separate attribute. Imagine one of the simplest scenario where you need to call a function foo that calls another function bar. You want the call to foo to execute from RAM for $reasons. I might write this:

#[ramfunc]
fn foo() {
    // ..
    bar(args);
    // ..
}

#[ramfunc]
fn bar(args: Args) { .. }

If #[ramfunc] implies #[inline(never)] you forbid LLVM from inlining bar into foo which could hamper optimizations. In this case inlining is something that may be favorable so it's best to leave LLVM do its thing.

OTOH, if #[ramfunc] doesn't imply #[inline(never)] then you really should mark foo as #[inline(never)] otherwise it may get inlined into the caller function that's running from Flash. So if we go this route we should have guideline to mark "entry" RAM functions as #[inline(never)].

Or, instead of that guideline we could add a ramcall!(fun) macro that calls fun through a thunk:

// `ramcall!(fun)` expands into:
{
    #[ramfunc]
    #[inline(never)]
    fn thunk() { fun() } // LLVM will likely inline `fun` into `thunk`
    thunk();
}

Then the recommendation would be to use ramcall!(fun) when you want to execute fun from RAM and the caller is being executed from Flash. This may work correctly even if fun is not a #[ramfunc]. The problem here is how to implement ramcall! so that it supports inputs for fun ergonomically.

Sep 03 '18 23:09 japaric

If #[ramfunc] implies #[inline(never)] you forbid LLVM from inlining bar into foo which could hamper optimizations. In this case inlining is something that may be favorable so it's best to leave LLVM do its thing.

True, but why would you do that? If you wanted to make sure it's inlined, why not declare it just #[inline(always)] and if you don't care just don't declare it as #[ramfunc]; then it's up to the compiler to decide wether to inline or not.

Sep 03 '18 23:09 therealprof

On inline: Can #[ramfunc] detect if the user has already applied an #[inline] attribute, so that by default ramfuncs would be #[inline(never)] but could be overridden with #[inline] on a per-function basis? What happens at the moment if you put #[inline] and #[ramfunc] on something?

On sections: I quite like the idea of having ram0, ram1, ram2, etc etc, where the user can define where they live in their memory.x to map to CCM/ITCM/DTCM/backup SRAM. I'd support a lot of regions, like 8 or 16. Higher end chips commonly have multiple normal SRAMs, plus ITCM/DTCM (or just CCM in other cases), backup SRAM, and you might well have multiple external mapped memories too. I would think the user configuration of number of sections can be part of memory.x; either all sections need to be defined (and most just have an address and length of 0x0), or defaults are applied for any that are undefined. Here's an example from a C RTOS. Note ram0 is made up of ram1 and ram2 (which are contiguous but distinct SRAMs).

I think we'd really need to be able to differentiate bss sections from data sections too: one common use of CCM is process stacks and they're likely several kB in size but all zero initially; it would be a huge waste to put them in .data. Perhaps just use #[ramdata(0)] vs #[rambss(0)] etc.

There are related issues around setting up memory protection units with details of the sections and which ones should have which attributes too.

Sep 03 '18 23:09 adamgreig

@therealprof

True, but why would you do that? If you wanted to make sure it's inlined

Because you are not 100% sure whether you inlining it is better or not -- that's the job of LLVM. What you want to be sure of is that if it's not inlined then bar should still be placed in RAM and you would accomplish that with #[ramfunc] #[inline] fn bar(..) but #[ramfunc] must not imply #[inline(never)].

@adamgreig

On inline

That might work. It seems #[ramfunc] can see all the other attributes regardless of whether they come before or after it.

On sections:

But having more memory regions is not zero cost. The loop for initializing .{bss,data}{0,1,...N} will be kept in the binary even if the section is empty. You already see this today with .bss and .data: the compiler doesn't know if a section is empty because that information is known only after linking. Thus the number should be configurable and not fixed.

Manually placing things in .bss is unsafe (e.g. placing static mut FOO: Enum = Enum::NonZeroVariant is UB) so the attribute should indicate that the operation is unsafe and using the attribute should be rejected by #[deny(unsafe_code)].

re: generics

Is that actually a problem? Presumably if the instantiation was generated it's because it's in use, so wouldn't be eliminated by the linker later anyway

Not necessarily. The default is to use multiple codegen units -- basically the Rust source code is split in chunks and each one is codegen-ed independently of the other. A simple example of instances lowered to machine code but still GC-ed is: you have foo that on some condition calls bar which in turns calls baz::<i32>. In a multiple codegen scenario foo and bar are processed separately. bar causes baz to be instantiated. While translating foo is observed that the branch that calls bar is unreachable so it gets eliminated. Now it's the job of the linker to discard the unused baz::<i32> instance.

The scenario you are picturing is probably only achievable using fat LTO (codegen-units = 1), but I'm not 100% sure it's guaranteed to occur.

Sep 04 '18 12:09 japaric

Because you are not 100% sure whether you inlining it is better or not -- that's the job of LLVM. What you want to be sure of is that if it's not inlined then bar should still be placed in RAM and you would accomplish that with #[ramfunc] #[inline] fn bar(..) but #[ramfunc] must not imply #[inline(never)].

I don't buy it. People who fiddle with such optimisations should also be capable of figuring out whether inlining is worth it or not and not leave such a decision to LLVM... After all small changes in rustc, LLVM or the program may change the inlining decision, so in doubt an automatic decision shouldn't be trusted anyway.

Sep 04 '18 12:09 therealprof

The main aspect here is respecting the declared wish of the developer: if a function is declared #[ramfunc] it has to go to RAM, the way to ensure that is by declaring #[inline(never)]. If there's a better way to ensure that -- no matter what -- a #[ramfunc] function will always end up in RAM (inlined in another #[ramfunc] or standalone), that's fine by me. Something that should not happen is that the compiler decides it would be better to inline a #[ramfunc] into code residing in flash...

Sep 04 '18 12:09 therealprof

@japaric wrote this:

Sure you can add #[ramfunc] to bar if you wrote it but if you are calling third party code, like libm, you have pretty much no control over whether something like sinf will end in Flash or RAM.

I recently had the need to write a function that is guaranteed to run from RAM (no access to Flash allowed before the function is finished). I found it impossible to do that in Rust, for the reason mentioned in the quote. Unfortunately that makes this feature useless for my use case. I don't know what could be done to change that.

I left a pull experience report here: https://github.com/rust-embedded/cortex-m-rt/issues/42#issuecomment-559061416

Nov 27 '19 12:11 hannobraun

Context_for_my_usecase - just wanted to see if there are updates for this thread. I happen to be working on a driver that writes a blob of data to internal flash. As ramfunctions require all dependent functions to be located in RAM, I was wondering if its makes sense to have board specific HALs containing an NVMC abstraction to use the attribute for relevant flash operations.

I'm assuming, not all vendors or boards include support for code in flash writing to flash. In such a case, does it make sense to use a board-specific hal (even if one's available) or would it be better to roll your own low-level register-based impl, given that ramfunctions only work if all dependent functions are also in RAM.
I'm not well-versed with C, does C (or c compiler) have better support for this?

Jul 19 '21 05:07 nihalpasham

cortex-m-rt
cortex-m-rt copied to clipboard

[RFC] `#[ramfunc]`

Summary

Motivation

Design and rationale

Expansion

Why an attribute?

Why a separate linker section?

Unresolved questions?

cortex-m-rt cortex-m-rt copied to clipboard

[RFC] `#[ramfunc]`

Summary

Motivation

Design and rationale

Expansion

Why an attribute?

Why a separate linker section?

Unresolved questions?

cortex-m-rt
cortex-m-rt copied to clipboard