wg icon indicating copy to clipboard operation
wg copied to clipboard

Stable assembly operations

Open japaric opened this issue 6 years ago • 42 comments

Triage(2018-08-21)

A pre-RFC discussing what should go in core::arch::arm have been opened in #184

Update - 2018-07-27

Intrinsics like __NOP are now available in core::arch::arm. Path towards stabilization is being discussed in https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-406927453.

Help wanted

We are looking for someone / people to help us write an RFC on stabilization of the ARM Cortex intrinsics in core::arch::arm. Details in https://github.com/rust-lang-nursery/embedded-wg/issues/63#issuecomment-408509178.


One of the features that ties embedded development to the nightly channel is the asm! macro, and by extension the global_asm! macro.

In some cases it's possible to turn an unstable asm! call into a stable FFI call that invokes some subroutine that comes from a pre-compiled external assembly file. This alternative comes at the cost of a function call overhead per asm! invocation.

In other cases the function call overhead of the FFI call breaks the intended semantics of the original asm! call; this can be seen when reading registers like the Program Counter (PC) or the Link Register (LR).

The asm! feature is hard to stabilize because it's directly tied to LLVM; rustc literally passes the contents of asm! invocations to LLVM's internal assembler. It's not possible to guarantee that the syntax of LLVM assembly won't change across LLVM releases so stabilizing the asm! feature in its current form is not possible.

This ticket is about exploring making operations that require assembly available on the stable channel. This proposal is not about stabilizing the asm! macro itself.

The main idea is that everything that requires assembly and can be implemented as external assembly files should use that approach. For everything that requires inlining the assembly operation the proposal is that the core crate will provide such functionality.

This idea is not unlike what's currently being done in stdsimd land: core::arch provides functions that are thin wrappers around unstable LLVM intrinsics that provide SIMD functionality. Similarly core::asm (module up for bikeshedding) would provide functionality that requires asm! but in a stable fashion. Example below:

mod asm {
    mod arm {
        #[inline(always)]
        #[stable]
        pub fn pc() -> u32 {
            let r;
            unsafe { asm!("mov $0,R15" : "=r"(r) ::: "volatile") }
            r
        }

        #[inline(always)]
        #[stable]
        pub fn lr() -> u32 {
            let r;
            unsafe { asm!("mov $0,R14" : "=r"(r) ::: "volatile") }
            r
        }
    }
}

This way the functionality can be preserved across LLVM upgrades; the maintainers of the core crate would update the implementation to match the current LLVM assembly syntax.

TODO (outdated)

  • Come up with a list of assembly operations that are used in practice and that would need to be provided in core::asm.
    • I would appreciate some help making a Proof of Concept PR against cortex-m that moves most of the assembly operations to external assembly files by enabling some Cargo feature. What can't be moved to assembly files would be a candidate for inclusion in core::asm.
    • AVR
    • MSP430
    • RISCV
  • [ ] The team will bring up this idea to the lang / compiler team during the Rust All Hands event.

cc @gnzlbg @nagisa could we get your thoughts on this? does this sound feasible, or does this sound like a bad idea?

cc @dvc94ch @dylanmckay @pftbest could you help me identify assembly operations that require inline assemly on AVR, MSP430 and RICSV?

japaric avatar Mar 14 '18 11:03 japaric

Is there any sequence of instructions for which a single call to an asm! macro containing the whole sequence is not equivalent to a sequence of asm! macro calls with one call per instruction?

I am thinking here about compiling in debug mode without optimizations were the code generated might differ, potentially in ways that break it.

gnzlbg avatar Mar 14 '18 12:03 gnzlbg

I’ve always been a strong proponent of the "intrinsic" functions that map down to a single instruction. For example, instead of writing asm!("cpsid" ...), you’d call the function that would do the same, look the same and behave mostly the same as the asm call. Not only it is harder to mess up using these intrinsics, but compiler also gets more information about the code.

Of course they have their downsides, but for most of the use cases in embedded, intrinsics are more than sufficient. In my experience using the IAR toolchain, which exposes such intrinsics for most of the important instructions, I’ve only had to maintain assembly to implement exactly two things: atomic byte increment and the context switch handler. Both of them implemented in an external assembly file.

I believe that we could work on adding intrinsics to rustc (LLVM doesn’t have a platform intrinsic for ARM cpsid, for example, but it has a llvm.returnaddress for lr, llvm.read_cycle_counter for cycle count and llvm.read_register to read arbitrary registers) and then the wrappers to stdsimd.

... So basically, what your proposal says.

nagisa avatar Mar 14 '18 12:03 nagisa

@gnzlbg yes, compiler is free to insert whatever code it wishes between the separate asm! statements to satisfy the constraints. For example it could decide to spill and reload the registers between two asm! calls.

nagisa avatar Mar 14 '18 12:03 nagisa

Thanks @nagisa that makes sense.

So my opinion is that it only makes sense to do this after we have stabilized inline assembly, which kind of defeats the point of doing this in the first place.

IIUC the idea behind this is that if we expose the hardware ISA via intrinsics then nothing will break if LLVM changes the syntax or semantics of inline assembly because we will be able to fix this in core.

But this is the problem: core will break.

That is, somebody will need to fix core, and the amount of work required to fix core is linearly proportional to the usage of inline assembly within core. This might be a lot of work already, but if we go this route that will turn into a titanic amount of work.

The same would happen if we decided to add another backed to rustc and had to reimplement all usages of inline assembly in "Cretonne-syntax".

So IMO the only way to make upgrading to a new syntax/backend realistic is to turn the porting effort from O(N) in the usage of inline assembly to O(1).

The only way I can think of to achieve this is to stabilize the inline assembly macro enough (e.g. like this https://internals.rust-lang.org/t/pre-rfc-inline-assembly/6443), so that upgrading can be done in Rust by "just" mapping Rust's inline assembly syntax to whatever syntax a new or upgraded backed uses. This will be a lot of work, but at least its independent from how often inline assembly is used.

Once we are there, we might just as well stabilize inline assembly instead of pursuing this. Libraries that implement this can be developed in the ecosystem, and "core" intrinsics can continue to be added to stdsimd which already exposes some intrinsics via inline assembly (some core stdsimd modules like run-time feature detection are built around inline assembly).

Then there is also the issue that two consecutive asm! calls allow LLVM to insert code in between. This might not matter often in practice, but sounds a bit brittle to me.

gnzlbg avatar Mar 14 '18 13:03 gnzlbg

The intrinsics would be implemented in the backend, rather than libcore. While it is true, that it would increase burden when upgrading a backend, it wouldn’t be any greater than the burden of adapting whatever we stabilise as our inline assembly implementation.

nagisa avatar Mar 14 '18 14:03 nagisa

Then there is also the issue that two consecutive asm! calls allow LLVM to insert code in between. This might not matter often in practice, but sounds a bit brittle to me.

With volatile assembly statements, the "code" would be limited to code that would be necessary to satisfy the constraints of the asm! statements. So, I think just moves from and to memory.

nagisa avatar Mar 14 '18 14:03 nagisa

why can't global_asm! be stabilized? Isn't that just like an external assembly file?

for riscv it would be quite a few intrinsics I believe. this is what I'm doing to r/w csr regs:

#[cfg(target_arch = "riscv")]
macro_rules! csr_asm {
    ($op:ident, $csr:expr, $value:expr) => (
        {
            let res: usize;
            unsafe {
                asm!(concat!(stringify!($op), " $0, ", stringify!($csr), ", $1")
                     : "=r"(res)
                     : "r"($value)
                     :
                     : "volatile");
            }
            res
        }
    )
}

the $csr value isn't an operand that can be loaded from a register, so there would need to be an intrinsic for each csr.

dvc94ch avatar Mar 14 '18 20:03 dvc94ch

For reference, here is ARM's C Language Extensions 2.0 document:

ARM® C Language Extensions Release 2.0

I believe these were implemented by ARM's in-house compilers and by GCC according to this page: 6.59.7 ARM C Language Extensions (ACLE)

I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics. Having a standard to point to should make it easier to get them implemented.

For instance, section "8.4 Hints" describe wfi, wfe, sev, sevl, yield, and dbg. "8.7" describes nop, which I think already has a LLVM intrinsic defined since pretty much all instruction sets have some variant of that instruction. "10.1 Special Registers" cover a variety of special registers.

jcsoo avatar Mar 14 '18 21:03 jcsoo

could you help me identify assembly operations that require inline assemly on AVR

I've taken a look over the full instruction set listing, here's what stands out

Assembly operations that are often required for nontrivial programs:

  • cli/sli instructions to globally enable or disable interrupts

Somewhat more obscure stuff:

  • If watchdog timers are enabled by the programmer, the wdr instruction to be executed regularly otherwise the chip will immediately reset. WDR is short for "watchdog timer reset"
  • The des instruction for DES encryption. Most existing software would do this in, well, software, so not major
  • The sleep instruction - a la std::thread::yield()

Unlike AVR-GCC, LLVM transparently handles accesses to and from program memory, meaning that that whole class of operations doesn't require asm! magic.

Although currently unimplemented, I believe LLVM's atomic intrinsics directly could be mapped to the RMW atomic instructions (FWIW these map to something like cli; <operation>; sei

dylanmckay avatar Mar 17 '18 07:03 dylanmckay

I believe that the stdsimd group knows about the NEON extensions but probably hasn't put much thought into other ARM intrinsics.

stdsimd does offer some v7 and v8 intrinsics, at least rev and rbit.

gnzlbg avatar Mar 17 '18 10:03 gnzlbg

I discussed this with @alexcrichton during Rust All Hands and he said he was fine with adding an stable API for assembly ops that have an unstable implementation (e.g. inline asm!). We'll make an RFC for the ARM Cortex-M ops, but first we have to figure out exactly what ops need to be inlined (will likely be less than 10 ops, maybe just 5 or so).

japaric avatar Apr 03 '18 13:04 japaric

@japaric would those go into std::arch ?

gnzlbg avatar Apr 03 '18 13:04 gnzlbg

@gnzlbg That can be bikeshed in the RFC

japaric avatar Apr 03 '18 13:04 japaric

In order to understand which intrinsics might be needed for the various Cortex-M processors, I extracted these two lists from the CMSIS-Core reference, excluding armv8-only instructions.

The first group of intrinsics is "Core Register Access". These are mainly wrappers around the CPSID, CPSIE, MRS, and MSR instructions, except for get_FPSCR and set_FPSCR.

https://www.keil.com/pack/doc/CMSIS/Core/html/group__Core__Register__gr.html

// Implemented using CPSID and CPSIE

disable_fault_irq - CPSID f
disable_irq - CPSID i
enable_fault_irq - CPSIE f
enable_irq - CPSIE e

// Mostly implemented using MRS

get_ASPR
get_BASEPRI
get_CONTROL
get_FAULTMASK
get_FPSCR - M4, M7
get_IPSR
get_MSP
get_PRIMASK
get_PSP
get_xPSR

// Mostly implemented using MSR

set_ASPR
set_BASEPRI
set_CONTROL
set_FAULTMASK
set_FPSCR - M4, M7
set_IPSR
set_MSP
set_PRIMASK
set_PSP
set_xPSR

The second group of intrinsics provides access to CPU instructions. Each of these is a wrapper around a specific instruction.

Some of these instructions have direct equivalents in core::intrinsics or are used for implementing numeric primitives. There are also many instructions that are likely used in the existing atomics support.

https://www.keil.com/pack/doc/CMSIS/Core/html/group__intrinsic__CPU__gr.html

* - in core or core::intrinsics or stdsimd
** - atomic

_NOP
_WFI - Wait For Interrupt
_WFE - Wait For Event
_SEV - Send Event
_BKPT* - Set Breakpoint - core::intrinsics::breakpoint()
_ISB - Instruction Synchronization Barrier
_DSB - Data Synchronization Barrier
_REV* - Reverse Byte Order (32 bit) - u32.swap_bytes()
_REV16* - Reverse Byte Order (16 bit) - u16.swap_bytes()
_REVSH - Reverse Byte Order Signed (16 bit)
_RBIT* - Reverse Bit Order
_ROR* - Rotate Right - u32.rotate_right()
_LDREXB** - LDR Exclusive (8 bit) - Not M0, M0+
_LDREXH** - LDR Exclusive (16 bit) - Not M0, M0+
_LDREXW** - LDR Exclusive (32 bit) - Not M0, M0+
_STREXB** - STR Exclusive (8 bit) - Not M0, M0+
_STREXH** - STR Exclusive (16 bit) - Not M0, M0+
_STREXW** - STR Exclusive (32 bit) - Not M0, M0+
_CLREX** - Remove Exclusive Lock - Not M0, M0+
_SSAT - Signed Saturate - Not M0, M0+
_USAT - Unsigned Saturate - Not M0, M0+
_CLZ* - Count Leading Zeros - u32.leading_zeros()
_RRX - Rotate Right with Extend (32 bit)
_LDRBT - LDRT Unprivileged (8 bit)
_LDRHT - LDRT Unprivileged (16 bit)
_LDRT - LDRT Unprivileged (32 bit)
_STRBT - STRT Unprivileged (8 bit)
_STRHT - STRHT Unprivileged (16 bit)
_STRT - STRT Unprivileged (32 bit)

In the first group, disable_fault_irq, disable_irq, enable_fault_irq, enable_irq seem to be pretty critical. The rest of the get_ / set_ functions are more specialized.

In the second group, NOP, WFI, WFE, SEV, ISB, DSB are ones that I am familiar with and use often. REVSH, RBIT, and RRX seem like they would be primarily for optimization. SSAT and USAT provide more flexibility than the core saturated math primitives by allowing selection of a bit width. LDRxT and STRxT are mainly for unprivileged access checking. The rest should be covered by built-ins and atomics.

jcsoo avatar Apr 26 '18 14:04 jcsoo

Do gcc, clang, or msvc provide any of these as functions ?

gnzlbg avatar Apr 26 '18 16:04 gnzlbg

As far as I can tell, all three implement them via the ACLE (ARM C Language Extensions) specification.

gcc - 6.59.7 ARM C Language Extensions (ACLE)

clang - The arm_acle.h header is shown in their documentation

msvc - ARM Intrinsics

jcsoo avatar Apr 26 '18 19:04 jcsoo

All of it?

The arm_acle.h header is exposed via std::arch::{arm,aarch64}. If all you need is arm_acle.h this doesn't even need an RFC. The only thing required for these to land on nightly is for someone to implement the missing functions in coresimd. For this to land in stable, an automatic verification of the header against the ACLE specification must be enabled in stdsimd, and after that, a mini-FCP would probably be enough.

gnzlbg avatar Apr 26 '18 20:04 gnzlbg

Thanks for the pointer back to coresimd - I looked at this briefly a while back but didn't dig in to figure out the specifics of how to get the additional intrinsics implemented.

coresimd/simd_llvm.rs provides the clue:

extern "platform-intrinsic" {
    pub fn simd_eq<T, U>(x: T, y: T) -> U;
    pub fn simd_ne<T, U>(x: T, y: T) -> U;
    pub fn simd_lt<T, U>(x: T, y: T) -> U;
    pub fn simd_le<T, U>(x: T, y: T) -> U;
    pub fn simd_gt<T, U>(x: T, y: T) -> U;
    pub fn simd_ge<T, U>(x: T, y: T) -> U;
    ...
}

Unfortunately onlysimd intrinsics are listed there. I'd never seen the platform-intrinsic extern type before, but after a bit of digging I found where they are defined: rust/src/etc/platform-intrinsics/arm.json.

Looking further, stdsimd issue #112 mentions link_llvm_intrinsic which enables using #[link_name="llvm.*"] to access LLVM intrinsics directly.

Implement all x86 vendor intrinsics shows how to create new intrinsics and also gave me enough information to find the list of LLVM arm intrinsics: https://github.com/llvm-mirror/llvm/blob/master/include/llvm/IR/IntrinsicsARM.td

So, as @gnzlbg points out, we should be able to do this through pull requests to stdsimd.

jcsoo avatar Apr 27 '18 13:04 jcsoo

For ACLE using link_llvm_intrinsics is the way to go. If there is an intrinsic that cannot be implemented that way we'll handle those on a case-by-case basis.

gnzlbg avatar Apr 27 '18 13:04 gnzlbg

Update: rust-lang-nursery/stdsimd#437 is tracking adding these assembly operations (instructions) to stdsimd.

japaric avatar May 10 '18 07:05 japaric

Discussed (triaged) in the last meeting:

This is more of a nice to have as it's not required for embedded Rust on stable. If we are to get this done by the edition release these are the final deadlines:

  • 1.28 (2018-08-02). implement rust-lang-nursery/stdsimd#437

  • 1.29 (2018-09-13). subset of most commonly used intrinsics stabilized in beta.

japaric avatar Jul 03 '18 05:07 japaric

Triage: Several CMSIS intrinsics have been implemented and are now available in core::arch::arm. The linked API documentation may not correctly reflect what's currently available in the latest nightly but you can check the source code: cmsis.rs and dsp.rs.

Most of the functionality in cortex-m that requires external assembly is now provided in core::arch::arm as an unstable API, with the notable exception of the BKPT instruction which will be added in rust-lang-nursery/stdsimd#540 once its API and implementation are decided.

The stabilization path for the non-SIMD subset of these intrinsics is being discussed in https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-406927453.

japaric avatar Jul 27 '18 06:07 japaric

Help wanted

We are looking for someone / people to help us write an RFC on stabilization of the ARM Cortex intrinsics in core::arch::arm.

The RFC needs to cover the points in https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-408480125. Some of these points have been discussed in that thread already starting from https://github.com/rust-lang-nursery/stdsimd/pull/518#issuecomment-406927453.

What needs to be done:

  • Read through the ACLE and CMSIS specifications.
  • Make a list of the non-SIMD intrinsics to stabilize (e.g. DMB, NOP, MSR)

  • Decide between ACLE naming style (__nop) and CMSIS naming style (__NOP). ACLE's is closer to the Rust naming style.

  • Decide what system register API will be used. CMSIS specifies __get_BASEPRI(); ACLE specifies __arm_rsr("BASEPRI") where the argument must be a constant (can't be a variable). The later API can't be implemented in today's Rust as it requires "const generics".

  • Write and submit an RFC that covers / contains:

    • The list of intrinsics to stabilize and their API. Which intrinsics are available on which targets (e.g. BASEPRI is Cortex-M specific).
    • The rationale for stabilizing these intrinsics ahead of the SIMD ones: e.g. "using instructions like WFI on stable require external assembly because inline assembly (asm!) is not stable; this adds a build dependency on arm-none-eabi-gcc. Having these intrinsics in core::arch::arm would let us drop the build dependency on arm-none-eabi-gcc and would (slightly) improve the performance and code size of programs that use these intrinsics"
    • The rationale for the two decisions around naming style and the system register API.
    • How these intrinsics are accessed and the rationale behind this decision: no runtime detection (e.g. cpuid) is required; the intrinsics are conditionally available on some subarchitectures (e.g. "mclass" / Cortex-M).

People interested in helping out leave a comment in this thread or contact japaric on IRC (#rust-embedded) or discord (#wg-embedded).

cc @paoloteti

japaric avatar Jul 27 '18 18:07 japaric

re __BKPT.

Adding const generics is very likely to be backwards compatible, so we could expose __BKPT() -> bktp 0 now and add a defaulted const generic later on without breaking anybody.

I personally haven’t had a need for bkpt <nonzero> ever in my life, and I’m sure that bkpt 0 will be sufficient for some 99.99% of use-cases.

nagisa avatar Jul 27 '18 19:07 nagisa

CMSIS, that at the end is an HAL spec. and not a compiler spec, contain intrinsics just as wrapper to ACLE (see armcc wrappers cmsis_armcc.h as example). So ACLE is the right choice, and CMSIS can be just a normal crate on top of ACLE. SIMD32/DSP intrinsics are already based on ACLE.

paoloteti avatar Jul 30 '18 09:07 paoloteti

As this mainly affects the Cortex-M ecosystem we should have someone on the @rust-embedded/cortex-m team champion this work. (This doesn't mean that you have to implement this; mentoring / helping a collaborator to implement this is also valid).

Solving https://github.com/rust-lang-nursery/stdsimd/issues/437#issuecomment-408810110 is the first step towards implementing this.

japaric avatar Aug 07 '18 14:08 japaric

I have been doing work with asm instructions before, so I can certainly help. Reading the referenced comments and PRs I do not quite see the issue, as it is started the difference between HAL and compiler spec.

@japaric To clarify, do you want help simply pushing it forward (is seems you already did the implementation) or to make the decision on CMSIS / ACLE discussion?

korken89 avatar Aug 07 '18 14:08 korken89

@korken89 we want to stabilize the non-SIMD instructions (WFI, CPSID, etc.) as that would let us drop the build dependency on arm-none-eabi-gcc in a few crates (cortex-m and cortex-m-rt) w/o requiring nightly. The details about the contents of the RFC are in https://github.com/rust-embedded/wg/issues/63#issuecomment-408509178. One of the questions that needs to be answered to write the RFC is ACLE vs CMSIS. The above comment has link(s) to more discussions on that topic.

japaric avatar Aug 07 '18 14:08 japaric

@japaric Thanks for the clarification! I have never written an RFC before, but I'd very much like to learn. So if no-one else with more experience wants to take it (or if my lack of experience is a problem), I can do this. Would be a good learning opportunity for me in the procedures of the Rust ecosystem.

korken89 avatar Aug 07 '18 15:08 korken89

A summary of "What is ACLE" and "What is CMSIS" from the point-of-view of "What should a compiler implement" would probably be a good start. It isn't really necessary to put that into an RFC or anything right now. Just posting it there as a comment would be enough to keep the discussion moving and unblock future work.

The only thing we are waiting for right now is for somebody to make a good case for which intrinsics should be implemented and why. We tried that before, but we have learned new things, so it is time to briefly re-evaluate whether CMSIS was the right choice or not (or whether we should provide only ACLE, or also ACLE, etc.).

gnzlbg avatar Aug 07 '18 16:08 gnzlbg