wg
wg copied to clipboard
Stable assembly operations
Triage(2018-08-21)
A pre-RFC discussing what should go in core::arch::arm
have been opened in #184
Update - 2018-07-27
Intrinsics like __NOP
are now available in core::arch::arm
. Path towards stabilization is being discussed in https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-406927453.
Help wanted
We are looking for someone / people to help us write an RFC on stabilization of the ARM Cortex intrinsics in core::arch::arm. Details in https://github.com/rust-lang-nursery/embedded-wg/issues/63#issuecomment-408509178.
One of the features that ties embedded development to the nightly channel is the asm!
macro, and
by extension the global_asm!
macro.
In some cases it's possible to turn an unstable asm!
call into a stable FFI call that invokes some
subroutine that comes from a pre-compiled external assembly file. This alternative comes at the cost
of a function call overhead per asm!
invocation.
In other cases the function call overhead of the FFI call breaks the intended semantics of the
original asm!
call; this can be seen when reading registers like the Program Counter (PC) or the
Link Register (LR).
The asm!
feature is hard to stabilize because it's directly tied to LLVM; rustc
literally passes
the contents of asm!
invocations to LLVM's internal assembler. It's not possible to guarantee that
the syntax of LLVM assembly won't change across LLVM releases so stabilizing the asm!
feature in
its current form is not possible.
This ticket is about exploring making operations that require assembly available on the stable
channel. This proposal is not about stabilizing the asm!
macro itself.
The main idea is that everything that requires assembly and can be implemented as external assembly
files should use that approach. For everything that requires inlining the assembly operation the
proposal is that the core
crate will provide such functionality.
This idea is not unlike what's currently being done in stdsimd land: core::arch
provides
functions that are thin wrappers around unstable LLVM intrinsics that provide SIMD functionality.
Similarly core::asm
(module up for bikeshedding) would provide functionality that requires asm!
but in a stable fashion. Example below:
mod asm {
mod arm {
#[inline(always)]
#[stable]
pub fn pc() -> u32 {
let r;
unsafe { asm!("mov $0,R15" : "=r"(r) ::: "volatile") }
r
}
#[inline(always)]
#[stable]
pub fn lr() -> u32 {
let r;
unsafe { asm!("mov $0,R14" : "=r"(r) ::: "volatile") }
r
}
}
}
This way the functionality can be preserved across LLVM upgrades; the maintainers of the core
crate would update the implementation to match the current LLVM assembly syntax.
TODO (outdated)
- Come up with a list of assembly operations that are used in practice and that would need to be
provided in
core::asm
.- I would appreciate some help making a Proof of Concept PR against
cortex-m
that moves most of the assembly operations to external assembly files by enabling some Cargo feature. What can't be moved to assembly files would be a candidate for inclusion incore::asm
. - AVR
- MSP430
- RISCV
- I would appreciate some help making a Proof of Concept PR against
- [ ] The team will bring up this idea to the lang / compiler team during the Rust All Hands event.
cc @gnzlbg @nagisa could we get your thoughts on this? does this sound feasible, or does this sound like a bad idea?
cc @dvc94ch @dylanmckay @pftbest could you help me identify assembly operations that require inline assemly on AVR, MSP430 and RICSV?
Is there any sequence of instructions for which a single call to an asm!
macro containing the whole sequence is not equivalent to a sequence of asm!
macro calls with one call per instruction?
I am thinking here about compiling in debug mode without optimizations were the code generated might differ, potentially in ways that break it.
I’ve always been a strong proponent of the "intrinsic" functions that map down to a single instruction. For example, instead of writing asm!("cpsid" ...)
, you’d call the function that would do the same, look the same and behave mostly the same as the asm
call. Not only it is harder to mess up using these intrinsics, but compiler also gets more information about the code.
Of course they have their downsides, but for most of the use cases in embedded, intrinsics are more than sufficient. In my experience using the IAR toolchain, which exposes such intrinsics for most of the important instructions, I’ve only had to maintain assembly to implement exactly two things: atomic byte increment and the context switch handler. Both of them implemented in an external assembly file.
I believe that we could work on adding intrinsics to rustc (LLVM doesn’t have a platform intrinsic for ARM cpsid, for example, but it has a llvm.returnaddress
for lr
, llvm.read_cycle_counter
for cycle count and llvm.read_register
to read arbitrary registers) and then the wrappers to stdsimd.
... So basically, what your proposal says.
@gnzlbg yes, compiler is free to insert whatever code it wishes between the separate asm!
statements to satisfy the constraints. For example it could decide to spill and reload the registers between two asm!
calls.
Thanks @nagisa that makes sense.
So my opinion is that it only makes sense to do this after we have stabilized inline assembly, which kind of defeats the point of doing this in the first place.
IIUC the idea behind this is that if we expose the hardware ISA via intrinsics then nothing will break if LLVM changes the syntax or semantics of inline assembly because we will be able to fix this in core.
But this is the problem: core will break.
That is, somebody will need to fix core, and the amount of work required to fix core is linearly proportional to the usage of inline assembly within core. This might be a lot of work already, but if we go this route that will turn into a titanic amount of work.
The same would happen if we decided to add another backed to rustc and had to reimplement all usages of inline assembly in "Cretonne-syntax".
So IMO the only way to make upgrading to a new syntax/backend realistic is to turn the porting effort from O(N) in the usage of inline assembly to O(1).
The only way I can think of to achieve this is to stabilize the inline assembly macro enough (e.g. like this https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443), so that upgrading can be done in Rust by "just" mapping Rust's inline assembly syntax to whatever syntax a new or upgraded backed uses. This will be a lot of work, but at least its independent from how often inline assembly is used.
Once we are there, we might just as well stabilize inline assembly instead of pursuing this. Libraries that implement this can be developed in the ecosystem, and "core" intrinsics can continue to be added to stdsimd
which already exposes some intrinsics via inline assembly (some core stdsimd
modules like run-time feature detection are built around inline assembly).
Then there is also the issue that two consecutive asm!
calls allow LLVM to insert code in between. This might not matter often in practice, but sounds a bit brittle to me.
The intrinsics would be implemented in the backend, rather than libcore. While it is true, that it would increase burden when upgrading a backend, it wouldn’t be any greater than the burden of adapting whatever we stabilise as our inline assembly implementation.
Then there is also the issue that two consecutive asm! calls allow LLVM to insert code in between. This might not matter often in practice, but sounds a bit brittle to me.
With volatile assembly statements, the "code" would be limited to code that would be necessary to satisfy the constraints of the asm!
statements. So, I think just moves from and to memory.
why can't global_asm! be stabilized? Isn't that just like an external assembly file?
for riscv it would be quite a few intrinsics I believe. this is what I'm doing to r/w csr regs:
#[cfg(target_arch = "riscv")]
macro_rules! csr_asm {
($op:ident, $csr:expr, $value:expr) => (
{
let res: usize;
unsafe {
asm!(concat!(stringify!($op), " $0, ", stringify!($csr), ", $1")
: "=r"(res)
: "r"($value)
:
: "volatile");
}
res
}
)
}
the $csr value isn't an operand that can be loaded from a register, so there would need to be an intrinsic for each csr.
For reference, here is ARM's C Language Extensions 2.0 document:
ARM® C Language Extensions Release 2.0
I believe these were implemented by ARM's in-house compilers and by GCC according to this page: 6.59.7 ARM C Language Extensions (ACLE)
I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics. Having a standard to point to should make it easier to get them implemented.
For instance, section "8.4 Hints" describe wfi
, wfe
, sev
, sevl
, yield
, and dbg
. "8.7" describes nop
, which I think already has a LLVM intrinsic defined since pretty much all instruction sets have some variant of that instruction. "10.1 Special Registers" cover a variety of special registers.
could you help me identify assembly operations that require inline assemly on AVR
I've taken a look over the full instruction set listing, here's what stands out
Assembly operations that are often required for nontrivial programs:
-
cli
/sli
instructions to globally enable or disable interrupts
Somewhat more obscure stuff:
- If watchdog timers are enabled by the programmer, the
wdr
instruction to be executed regularly otherwise the chip will immediately reset. WDR is short for "watchdog timer reset" - The
des
instruction for DES encryption. Most existing software would do this in, well, software, so not major - The
sleep
instruction - a lastd::thread::yield()
Unlike AVR-GCC, LLVM transparently handles accesses to and from program memory, meaning that that whole class of operations doesn't require asm!
magic.
Although currently unimplemented, I believe LLVM's atomic intrinsics directly could be mapped to the RMW atomic instructions (FWIW these map to something like cli; <operation>; sei
I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics.
stdsimd
does offer some v7
and v8
intrinsics, at least rev
and rbit
.
I discussed this with @alexcrichton during Rust All Hands and he said he was fine with adding an stable API for assembly ops that have an unstable implementation (e.g. inline asm!
). We'll make an RFC for the ARM Cortex-M ops, but first we have to figure out exactly what ops need to be inlined (will likely be less than 10 ops, maybe just 5 or so).
@japaric would those go into std::arch
?
@gnzlbg That can be bikeshed in the RFC
In order to understand which intrinsics might be needed for the various Cortex-M processors, I extracted these two lists from the CMSIS-Core reference, excluding armv8-only instructions.
The first group of intrinsics is "Core Register Access". These are mainly wrappers around the CPSID, CPSIE, MRS, and MSR instructions, except for get_FPSCR and set_FPSCR.
https://www.keil.com/pack/doc/CMSIS/Core/html/group__Core__Register__gr.html
// Implemented using CPSID and CPSIE
disable_fault_irq - CPSID f
disable_irq - CPSID i
enable_fault_irq - CPSIE f
enable_irq - CPSIE e
// Mostly implemented using MRS
get_ASPR
get_BASEPRI
get_CONTROL
get_FAULTMASK
get_FPSCR - M4, M7
get_IPSR
get_MSP
get_PRIMASK
get_PSP
get_xPSR
// Mostly implemented using MSR
set_ASPR
set_BASEPRI
set_CONTROL
set_FAULTMASK
set_FPSCR - M4, M7
set_IPSR
set_MSP
set_PRIMASK
set_PSP
set_xPSR
The second group of intrinsics provides access to CPU instructions. Each of these is a wrapper around a specific instruction.
Some of these instructions have direct equivalents in core::intrinsics or are used for implementing numeric primitives. There are also many instructions that are likely used in the existing atomics support.
https://www.keil.com/pack/doc/CMSIS/Core/html/group__intrinsic__CPU__gr.html
* - in core or core::intrinsics or stdsimd
** - atomic
_NOP
_WFI - Wait For Interrupt
_WFE - Wait For Event
_SEV - Send Event
_BKPT* - Set Breakpoint - core::intrinsics::breakpoint()
_ISB - Instruction Synchronization Barrier
_DSB - Data Synchronization Barrier
_REV* - Reverse Byte Order (32 bit) - u32.swap_bytes()
_REV16* - Reverse Byte Order (16 bit) - u16.swap_bytes()
_REVSH - Reverse Byte Order Signed (16 bit)
_RBIT* - Reverse Bit Order
_ROR* - Rotate Right - u32.rotate_right()
_LDREXB** - LDR Exclusive (8 bit) - Not M0, M0+
_LDREXH** - LDR Exclusive (16 bit) - Not M0, M0+
_LDREXW** - LDR Exclusive (32 bit) - Not M0, M0+
_STREXB** - STR Exclusive (8 bit) - Not M0, M0+
_STREXH** - STR Exclusive (16 bit) - Not M0, M0+
_STREXW** - STR Exclusive (32 bit) - Not M0, M0+
_CLREX** - Remove Exclusive Lock - Not M0, M0+
_SSAT - Signed Saturate - Not M0, M0+
_USAT - Unsigned Saturate - Not M0, M0+
_CLZ* - Count Leading Zeros - u32.leading_zeros()
_RRX - Rotate Right with Extend (32 bit)
_LDRBT - LDRT Unprivileged (8 bit)
_LDRHT - LDRT Unprivileged (16 bit)
_LDRT - LDRT Unprivileged (32 bit)
_STRBT - STRT Unprivileged (8 bit)
_STRHT - STRHT Unprivileged (16 bit)
_STRT - STRT Unprivileged (32 bit)
In the first group, disable_fault_irq, disable_irq, enable_fault_irq, enable_irq seem to be pretty critical. The rest of the get_ / set_ functions are more specialized.
In the second group, NOP, WFI, WFE, SEV, ISB, DSB are ones that I am familiar with and use often. REVSH, RBIT, and RRX seem like they would be primarily for optimization. SSAT and USAT provide more flexibility than the core saturated math primitives by allowing selection of a bit width. LDRxT and STRxT are mainly for unprivileged access checking. The rest should be covered by built-ins and atomics.
Do gcc, clang, or msvc provide any of these as functions ?
As far as I can tell, all three implement them via the ACLE (ARM C Language Extensions) specification.
gcc - 6.59.7 ARM C Language Extensions (ACLE)
clang - The arm_acle.h header is shown in their documentation
msvc - ARM Intrinsics
All of it?
The arm_acle.h
header is exposed via std::arch::{arm,aarch64}
. If all you need is arm_acle.h
this doesn't even need an RFC. The only thing required for these to land on nightly is for someone to implement the missing functions in coresimd
. For this to land in stable, an automatic verification of the header against the ACLE specification must be enabled in stdsimd
, and after that, a mini-FCP would probably be enough.
Thanks for the pointer back to coresimd - I looked at this briefly a while back but didn't dig in to figure out the specifics of how to get the additional intrinsics implemented.
coresimd/simd_llvm.rs provides the clue:
extern "platform-intrinsic" {
pub fn simd_eq<T, U>(x: T, y: T) -> U;
pub fn simd_ne<T, U>(x: T, y: T) -> U;
pub fn simd_lt<T, U>(x: T, y: T) -> U;
pub fn simd_le<T, U>(x: T, y: T) -> U;
pub fn simd_gt<T, U>(x: T, y: T) -> U;
pub fn simd_ge<T, U>(x: T, y: T) -> U;
...
}
Unfortunately onlysimd intrinsics are listed there. I'd never seen the platform-intrinsic
extern type before, but after a bit of digging I found where they are defined: rust/src/etc/platform-intrinsics/arm.json.
Looking further, stdsimd issue #112 mentions link_llvm_intrinsic which enables
using #[link_name="llvm.*"]
to access LLVM intrinsics directly.
Implement all x86 vendor intrinsics shows how to create new intrinsics and also gave me enough information to find the list of LLVM arm intrinsics: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsARM.td
So, as @gnzlbg points out, we should be able to do this through pull requests to stdsimd.
For ACLE using link_llvm_intrinsics
is the way to go. If there is an intrinsic that cannot be implemented that way we'll handle those on a case-by-case basis.
Update: rust-lang-nursery/stdsimd#437 is tracking adding these assembly operations (instructions) to stdsimd.
Discussed (triaged) in the last meeting:
This is more of a nice to have as it's not required for embedded Rust on stable. If we are to get this done by the edition release these are the final deadlines:
-
1.28 (2018-08-02). implement rust-lang-nursery/stdsimd#437
-
1.29 (2018-09-13). subset of most commonly used intrinsics stabilized in beta.
Triage: Several CMSIS intrinsics have been implemented and are now available in core::arch::arm
. The linked API documentation may not correctly reflect what's currently available in the latest nightly but you can check the source code: cmsis.rs and dsp.rs.
Most of the functionality in cortex-m
that requires external assembly is now provided in core::arch::arm
as an unstable API, with the notable exception of the BKPT instruction which will be added in rust-lang-nursery/stdsimd#540 once its API and implementation are decided.
The stabilization path for the non-SIMD subset of these intrinsics is being discussed in https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-406927453.
Help wanted
We are looking for someone / people to help us write an RFC on stabilization of the ARM Cortex intrinsics in core::arch::arm
.
The RFC needs to cover the points in https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-408480125. Some of these points have been discussed in that thread already starting from https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-406927453.
What needs to be done:
-
Make a list of the non-SIMD intrinsics to stabilize (e.g. DMB, NOP, MSR)
-
Decide between ACLE naming style (
__nop
) and CMSIS naming style (__NOP
). ACLE's is closer to the Rust naming style. -
Decide what system register API will be used. CMSIS specifies
__get_BASEPRI()
; ACLE specifies__arm_rsr("BASEPRI")
where the argument must be a constant (can't be a variable). The later API can't be implemented in today's Rust as it requires "const generics". -
Write and submit an RFC that covers / contains:
- The list of intrinsics to stabilize and their API. Which intrinsics are available on which targets (e.g. BASEPRI is Cortex-M specific).
- The rationale for stabilizing these intrinsics ahead of the SIMD ones: e.g. "using instructions like WFI on stable require external assembly because inline assembly (
asm!
) is not stable; this adds a build dependency onarm-none-eabi-gcc
. Having these intrinsics incore::arch::arm
would let us drop the build dependency onarm-none-eabi-gcc
and would (slightly) improve the performance and code size of programs that use these intrinsics" - The rationale for the two decisions around naming style and the system register API.
- How these intrinsics are accessed and the rationale behind this decision: no runtime detection (e.g. cpuid) is required; the intrinsics are conditionally available on some subarchitectures (e.g. "mclass" / Cortex-M).
People interested in helping out leave a comment in this thread or contact japaric on IRC (#rust-embedded) or discord (#wg-embedded).
cc @paoloteti
re __BKPT.
Adding const generics is very likely to be backwards compatible, so we could expose __BKPT()
-> bktp 0
now and add a defaulted const generic later on without breaking anybody.
I personally haven’t had a need for bkpt <nonzero>
ever in my life, and I’m sure that bkpt 0
will be sufficient for some 99.99% of use-cases.
CMSIS, that at the end is an HAL spec. and not a compiler spec, contain intrinsics just as wrapper to ACLE (see armcc
wrappers cmsis_armcc.h as example).
So ACLE is the right choice, and CMSIS can be just a normal crate on top of ACLE.
SIMD32/DSP intrinsics are already based on ACLE.
As this mainly affects the Cortex-M ecosystem we should have someone on the @rust-embedded/cortex-m team champion this work. (This doesn't mean that you have to implement this; mentoring / helping a collaborator to implement this is also valid).
Solving https://github.com/rust-lang-nursery/stdsimd/issues/437#issuecomment-408810110 is the first step towards implementing this.
I have been doing work with asm instructions before, so I can certainly help. Reading the referenced comments and PRs I do not quite see the issue, as it is started the difference between HAL and compiler spec.
@japaric To clarify, do you want help simply pushing it forward (is seems you already did the implementation) or to make the decision on CMSIS / ACLE discussion?
@korken89 we want to stabilize the non-SIMD instructions (WFI, CPSID, etc.) as that would let us drop the build dependency on arm-none-eabi-gcc
in a few crates (cortex-m
and cortex-m-rt
) w/o requiring nightly. The details about the contents of the RFC are in https://github.com/rust-embedded/wg/issues/63#issuecomment-408509178. One of the questions that needs to be answered to write the RFC is ACLE vs CMSIS. The above comment has link(s) to more discussions on that topic.
@japaric Thanks for the clarification! I have never written an RFC before, but I'd very much like to learn. So if no-one else with more experience wants to take it (or if my lack of experience is a problem), I can do this. Would be a good learning opportunity for me in the procedures of the Rust ecosystem.
A summary of "What is ACLE" and "What is CMSIS" from the point-of-view of "What should a compiler implement" would probably be a good start. It isn't really necessary to put that into an RFC or anything right now. Just posting it there as a comment would be enough to keep the discussion moving and unblock future work.
The only thing we are waiting for right now is for somebody to make a good case for which intrinsics should be implemented and why. We tried that before, but we have learned new things, so it is time to briefly re-evaluate whether CMSIS was the right choice or not (or whether we should provide only ACLE, or also ACLE, etc.).