cortex-m-rt
                                
                                 cortex-m-rt copied to clipboard
                                
                                    cortex-m-rt copied to clipboard
                            
                            
                            
                        [RFC] `#[ramfunc]`
This is a 2 in 1 RFC and implementation PR. This builds on top of #90 so you can ignore the first commit. See #42 for some background information.
@rust-embedded/cortex-m Please review this RFC and if you are in favor approve this PR.
Summary
Add a #[ramfunc] attribute to this crate. This attribute can be applied to
functions to place them in RAM.
Motivation
Running functions from RAM may result in faster execution times. Thus placing "hot code" in RAM may improve the performance of an application.
Design and rationale
Expansion
The #[ramfunc] attribute will expand into a #[link_section = ..]
attribute that will place the function in a .ramfunc.$hash section. Each
#[ramfunc] function will be placed in a different linker section (in a
similar fashion to -ffunction-sections) to let the linker GC (Garbage Collect)
the unused functions . To this end, #[ramfunc] won't accept generic functions
because if the generic function has several instantiations all of them would end
in the same linker section.
link.x will be tweaked to instruct the linker to collect all the input
sections of the form .ramfunc.* into a single .ramfunc output section that
will placed in RAM.
The Reset handler will be updated to initialize the .ramfunc section before
the user entry point is called.
Why an attribute?
This is implemented as an attribute to make it composable with the attributes to be added in RFC #90. For example you can place an exception handler in RAM. The compiler won't generate a veener in this case as the address to the RAM location will be placed in the vector table.
#[exception(SysTick)]
#[ramfunc]
fn systick() {
    // do stuff
}
Why a separate linker section?
Using a separate section lets the compiler / linker assign the right ELF flags
to .ramfunc (i.e. "AX"). That way .ramfunc ends up showing in the output of
objdump -C.
$ readelf -S app
  [ 1] .vector_table     PROGBITS        08000000 010000 000400 00   A  0   0  4
  [ 2] .text             PROGBITS        08000400 010400 0003ea 00  AX  0   0  2
  [ 3] .rodata           PROGBITS        080007ec 020020 000000 00  WA  0   0  4
  [ 4] .data             PROGBITS        20000000 020000 000004 00  WA  0   0  4
  [ 5] .ramfunc          PROGBITS        20000004 020004 00001c 00  AX  0   0  4
  [ 6] .bss              NOBITS          20000020 000000 000000 00  WA  0   0  4
$ arm-none-eabi-objdump -Cd app
Disassembly of section .text:
08000400 <Reset>:
 8000400:       f000 f9ee       bl      80007e0 <DefaultPreInit>
 (..)
Disassembly of section .ramfunc:
20000004 <SysTick>:
20000004:       b580            push    {r7, lr}
(..)
This also means that .data would only contain ... data so the section can be
inspected (not disassembled) separately.
$ arm-none-eabi-objdump -s -j .data app
Contents of section .data:
 20000000 01000000                             ....
The downside (?) is that the default format of size will report .ramfunc
under text.
$ size app
   text    data     bss     dec     hex filename
   2066       4       4    2074     81a app
But in any case it's (usually) better to look at the output in System V format.
$ size -Ax app
app  :
section             size         addr
.vector_table      0x400    0x8000000
.text              0x3ea    0x8000400
.rodata              0x0    0x80007ec
.data                0x4   0x20000000
.ramfunc            0x28   0x20000004
.bss                 0x4   0x2000002c
Unresolved questions?
- 
Is there a reliable way to initialize both .dataand.ramfuncin a single for loop? (seeinit_datain theResetfunction) Right now I initialize each one separately and this produces more code. I haven't tried initializing them in a single pass.
- 
Should #[ramfunc]imply#[inline(never)]? Right now the compiler is free to inline functions marked as#[ramfunc]into other functions. That could cause the function to not end in RAM.
- 
Bikeshed the attribute name and linker section name? IAR and TI use the name ramfuncfor this functionality.
Oh and I have not tested this on hardware. I have only looked at the output of objdump so it would be good if someone tested it to see if it actually works ...
@japaric This is fantastic! Especially the ability to have exception/interrupt handlers without veneer.
Should #[ramfunc] imply #[inline(never)]? Right now the compiler is free to inline functions marked as #[ramfunc] into other functions. That could cause the function to not end in RAM.
Yes, we should do that.
Something that should be mentioned in the docs is that #[ramfunc] does not automatically propagate through the call stack. By this I mean that code like:
#[ramfunc]
fn foo() {
    // ..
    bar();
    // ..
}
fn bar() { .. }
Could result in foo being executed from RAM and bar being executed from Flash unless bar gets inlined into foo but that would be up to the compiler.
Sure you can add #[ramfunc] to bar if you wrote it but if you are calling third party code, like libm, you have pretty much no control over whether something like sinf will end in Flash or RAM.
Also I agree with @therealprof, a ramfunc should not be inlined, that destroys the purpose. :)
Writing RAM functions was a bit of an art before to make sure the entire call-tree is in RAM, though I do not believe we can auto promote other functions to RAM, and more importantly if we should.
Sounds good to me. I'm quite used to this sort of thing from working with big-name RTOS from a vendor that rhymes with "Nentor" and the little pitfalls tend to be fairly well known by people who have a need for this.
Should #[ramfunc] imply #[inline(never)]?
:+1:
Bikeshed the attribute name and linker section name?
I'm fine with ramfunc.
To this end, #[ramfunc] won't accept generic functions because if the generic function has several instantiations all of them would end in the same linker section.
Is that actually a problem? Presumably if the instantiation was generated it's because it's in use, so wouldn't be eliminated by the linker later anyway, but I might be misunderstanding how the generic instantiations get made. In any event it seems like it might be more useful to be able to have ramfunc generics than not. Since calling a function from a ramfunc doesn't make the called function live in RAM unless it gets inlined, you can't even make your own type-specific instantiations which call the generic and be confident it will still be in RAM.
Otherwise this looks good. As @korken89 mentioned it would be really nice if we could specify what section ramfuncs end up in, but I think we also want that for statics, and for things like #59, and even being able to move stacks to CCM. The same consideration about being able to initialise in a single loop would apply to non-.data statics too.
@korken89
Continuing thread https://github.com/rust-embedded/cortex-m-rt/pull/100#discussion_r214315147 here because review comments get hidden when new commits are pushed.
We could support placing functions in CCRAM by adding a #[ccramfunc] attribute but this AFAICT this would require all users to define a CCRAM.
An alternative is to extend the #[ramfunc] to take an integer argument instead of adding a new attribute. This could be generalized to support any number of RAM regions. That is, #[ramfunc(0)] would place a function in .ramfunc0, #[ramfunc(1)] would place a function in .ramfunc1 and so on. Then .ramfunc0 would go in region RAM0, .ramfunc1 in RAM1, and so on.
This can be further generalized to allow placing static mut variables in different RAM regions. At that point, it would make more sense to rename the attribute to just #[ram]. Then #[ram] fn foo() would put foo in .ramfunc0 (no argument is equivalent to passing 0 as the argument), and #[ram(1)] static mut FOO: u32 = 0 (*) would put FOO in .data1 (**). This opens the question of how many regions to support by default? Just 1? And the question of how to let users configure the number of regions? Cargo features (being booleans) seems like a bad fit for this kind of configuration.
(*) Using #[ram] with static variables is tricky and potentially error prone. A static variable could end in .rodata (e.g. a string literal) or in .data (if it contains something with interior mutability or an atomic). The attribute doesn't have enough information (just the item AST) to determine if a static should really go in RAM so a misuse could cause a string literal to be stored in RAM, which would be wasteful.
(**) Similarly the attribute doesn't have enough information to determine if something should go in .bss or .data so putting everything in .data is the safe and conservative choice although it can unnecessarily increase the initialization time.
On the one hand, support for multiple memory regions looks like a breaking change so it would be good to batch it with breaking addition of attributes. On the other hand, I don't have the bandwidth to sit down and design something reasonable at the moment so I would prefer to postpone it for v0.7.0.
Disclaimer: I never had the need to call stuff from RAM so take my comment with a grain of salt.
I'm wondering if having #[ramfunc] imply #[inline(never)] is more error prone or at least less flexible than leaving #[inline(never)] as a separate attribute. Imagine one of the simplest scenario where you need to call a function foo that calls another function bar. You want the call to foo to execute from RAM for $reasons. I might write this:
#[ramfunc]
fn foo() {
    // ..
    bar(args);
    // ..
}
#[ramfunc]
fn bar(args: Args) { .. }
If #[ramfunc] implies #[inline(never)] you forbid LLVM from inlining bar into foo which could hamper optimizations. In this case inlining is something that may be favorable so it's best to leave LLVM do its thing.
OTOH, if #[ramfunc] doesn't imply #[inline(never)] then you really should mark foo as #[inline(never)] otherwise it may get inlined into the caller function that's running from Flash. So if we go this route we should have guideline to mark "entry" RAM functions as #[inline(never)].
Or, instead of that guideline we could add a ramcall!(fun) macro that calls fun through a thunk:
// `ramcall!(fun)` expands into:
{
    #[ramfunc]
    #[inline(never)]
    fn thunk() { fun() } // LLVM will likely inline `fun` into `thunk`
    thunk();
}
Then the recommendation would be to use ramcall!(fun) when you want to execute fun from RAM and the caller is being executed from Flash. This may work correctly even if fun is not a #[ramfunc]. The problem here is how to implement ramcall! so that it supports inputs for fun ergonomically.
If #[ramfunc] implies #[inline(never)] you forbid LLVM from inlining bar into foo which could hamper optimizations. In this case inlining is something that may be favorable so it's best to leave LLVM do its thing.
True, but why would you do that? If you wanted to make sure it's inlined, why not declare it just #[inline(always)] and if you don't care just don't declare it as #[ramfunc]; then it's up to the compiler to decide wether to inline or not.
On inline:
Can #[ramfunc] detect if the user has already applied an #[inline] attribute, so that by default ramfuncs would be #[inline(never)] but could be overridden with #[inline] on a per-function basis? What happens at the moment if you put #[inline] and #[ramfunc] on something?
On sections:
I quite like the idea of having ram0, ram1, ram2, etc etc, where the user can define where they live in their memory.x to map to CCM/ITCM/DTCM/backup SRAM. I'd support a lot of regions, like 8 or 16. Higher end chips commonly have multiple normal SRAMs, plus ITCM/DTCM (or just CCM in other cases), backup SRAM, and you might well have multiple external mapped memories too. I would think the user configuration of number of sections can be part of memory.x; either all sections need to be defined (and most just have an address and length of 0x0), or defaults are applied for any that are undefined. Here's an example from a C RTOS. Note ram0 is made up of ram1 and ram2 (which are contiguous but distinct SRAMs).
I think we'd really need to be able to differentiate bss sections from data sections too: one common use of CCM is process stacks and they're likely several kB in size but all zero initially; it would be a huge waste to put them in .data. Perhaps just use #[ramdata(0)] vs #[rambss(0)] etc.
There are related issues around setting up memory protection units with details of the sections and which ones should have which attributes too.
@therealprof
True, but why would you do that? If you wanted to make sure it's inlined
Because you are not 100% sure whether you inlining it is better or not -- that's the job of LLVM. What you want to be sure of is that if it's not inlined then bar should still be placed in RAM and you would accomplish that with #[ramfunc] #[inline] fn bar(..) but #[ramfunc] must not imply #[inline(never)].
@adamgreig
On inline
That might work. It seems #[ramfunc] can see all the other attributes regardless of whether they come before or after it.
On sections:
But having more memory regions is not zero cost. The loop for initializing .{bss,data}{0,1,...N} will be kept in the binary even if the section is empty. You already see this today with .bss and .data: the compiler doesn't know if a section is empty because that information is known only after linking. Thus the number should be configurable and not fixed.
Manually placing things in .bss is unsafe (e.g. placing static mut FOO: Enum = Enum::NonZeroVariant is UB) so the attribute should indicate that the operation is unsafe and using the attribute should be rejected by #[deny(unsafe_code)].
re: generics
Is that actually a problem? Presumably if the instantiation was generated it's because it's in use, so wouldn't be eliminated by the linker later anyway
Not necessarily. The default is to use multiple codegen units -- basically the Rust source code is split in chunks and each one is codegen-ed independently of the other. A simple example of instances lowered to machine code but still GC-ed is: you have foo that on some condition calls bar which in turns calls baz::<i32>. In a multiple codegen scenario foo and bar are processed separately. bar causes baz to be instantiated. While translating foo is observed that the branch that calls bar is unreachable so it gets eliminated. Now it's the job of the linker to discard the unused baz::<i32> instance.
The scenario you are picturing is probably only achievable using fat LTO (codegen-units = 1), but I'm not 100% sure it's guaranteed to occur.
Because you are not 100% sure whether you inlining it is better or not -- that's the job of LLVM. What you want to be sure of is that if it's not inlined then bar should still be placed in RAM and you would accomplish that with #[ramfunc] #[inline] fn bar(..) but #[ramfunc] must not imply #[inline(never)].
I don't buy it. People who fiddle with such optimisations should also be capable of figuring out whether inlining is worth it or not and not leave such a decision to LLVM... After all small changes in rustc, LLVM or the program may change the inlining decision, so in doubt an automatic decision shouldn't be trusted anyway.
The main aspect here is respecting the declared wish of the developer: if a function is declared #[ramfunc] it has to go to RAM, the way to ensure that is by declaring #[inline(never)]. If there's a better way to ensure that -- no matter what --  a #[ramfunc] function will always end up in RAM (inlined in another #[ramfunc] or standalone), that's fine by me. Something that should not happen is that the compiler decides it would be better to inline a #[ramfunc] into code residing in flash...
@japaric wrote this:
Sure you can add
#[ramfunc]tobarif you wrote it but if you are calling third party code, likelibm, you have pretty much no control over whether something likesinfwill end in Flash or RAM.
I recently had the need to write a function that is guaranteed to run from RAM (no access to Flash allowed before the function is finished). I found it impossible to do that in Rust, for the reason mentioned in the quote. Unfortunately that makes this feature useless for my use case. I don't know what could be done to change that.
I left a pull experience report here: https://github.com/rust-embedded/cortex-m-rt/issues/42#issuecomment-559061416
Context_for_my_usecase - just wanted to see if there are updates for this thread. I happen to be working on a driver that writes a blob of data to internal flash. As ramfunctions require all dependent functions to be located in RAM, I was wondering if its makes sense to have board specific HALs containing an NVMC abstraction to use the attribute for relevant flash operations.
- I'm assuming, not all vendors or boards include support for code in flash writing to flash. In such a case, does it make sense to use a board-specific hal (even if one's available) or would it be better to roll your own low-level register-based impl, given that ramfunctions only work if all dependent functions are also in RAM.
- I'm not well-versed with C, does C (or c compiler) have better support for this?