svd2rust icon indicating copy to clipboard operation
svd2rust copied to clipboard

Bit-banded access

Open chrysn opened this issue 6 years ago • 9 comments

in writing the EFM32 HAL, I sometimes end up implementing an abstract peripheral in unsafe Rust, where I take a register the I don't actually have a mutable reference to, modify that register. The unsafeness is justified because Rust can't express that my mutable reference (eg. to a GPIO pin) conceptually owns bit 14 of some half a dozen registers (and would hold immutable references to others).

So far, I could usually get away either with directly writing to the register (eg. because it's a write-only set-state/clear-state register), or using bit-banding to set or clear an individual bit in a register without doing a read-modify-write, which can't be justified (because another peripheral might simultaneously do that in an interrupt). However, the latter is only easily possible for full-i32 registers where I can get a pointer of the register and know numerically which bit to set; now that I want to apply the same to a bit field defined via svd2rust, I can't do that. (I can't change_bitband(my_register, which_bit, true) because the svd2rust-generated crate does not tell me the number of the named bit I'd like to access).

I suggest that architecture-specific bit-banding implementations be used to allow atomic setting/clearing of bits in a register. The .modify() API probably can't do that, so an additional API might be necessary that would set single-bit fields one-at-a-time. Even ignoring all the unsafe stuff above, that would probably allow more efficient setting of single bits compared to read-modify-right.

That API could then justifiably be used in unsafe code where .modify() is out of the question.

chrysn avatar Jun 22 '18 09:06 chrysn

Straw-men API usage proposals:

  • hfperclken0.bit_modify(|b| b.i2c0().set_bit()) (to stay similar to write and modify APIs; different bits could be chained but would be executed sequentially)
  • hfperclken0.bit_modify().i2c0().set_bit() (because we don't actually need a closure to accumulate changes when they don't happen at the same time anyway)

chrysn avatar Jun 22 '18 09:06 chrysn

Before I found svd2rust I and a co-worker had implemented a crate for stm32f2 that did single-bit writes with bitbanding, and it's something I miss quite a lot. We saw a big code-size reduction, and a 5-10% performance improvement, when turning that on (though, granted, that code was peripheral-heavy).

(One issue we ran into was trying to use bitbanding to clear w1c bits, though -- that is, "write 1 to clear" bits, like interrupt status registers usually work. The hardware turns the bitband access into a read-modify-write sequence on the bus, so any other w1c bits that were set in the 32-bit word being modified were also getting cleared, because the modified value kept those bits at 1. So any register that has any w1c bit in it is unsafe to bitband.)

But I think this concept could be extended even farther. Some Cortex microcontrollers have bitbanding, which is great as far as it goes, but others have a more-general "bit-manipulation engine" on the AHB (in particular, Nexperia/NXP/Freescale's Kinetis series) that has basically its own instruction set in the addresses being used. It can flip individual bits, set entire bitfields (so this isn't limited to just modifying a single bit anymore! ...but the bits do have to be sequential, so it's not possible to, say, initialize most of an stm32f2's I2C0_CR1 config at once, because of the reserved bit in between the SMBUS and PE bits, and the SMBTYPE bit), and of course do individual bit-set and bit-clear operations.

One issue I see with the second proposed API (that is, getting rid of the callback) is that I don't see when the write would happen, since Drop impls aren't guaranteed to be called, and "set" versus "clear" are two different actions on the bitbanding addresses. (As well as being different with BME; they're writes to different virtual addresses.)

But the first API, with callbacks, could work pretty well with the BME as well as written. Or it could be made more specific:

hperfclken0.bit_set(|w| w.i2c0()) could set this bit with either BME or bitbanding hperfclken0.bit_clear(|w| w.i2c0()) could clear it, again with either BME or bitbanding

For BME-only operations:

hperfclken0.bit_flip(|w| w.i2c0()) could flip it

tpm0conf.set_field(|w| w.trgsel(TPM0CONF::TRGSEL::CMP0_OUT)) could set the sub-field to the bits represented by the given enum value (CMP0_OUT is 0b0001; the trgsel field is in bits 24-27 of the register).

What I don't know is how to signal to svd2rust that the microcontroller in question has a BME or bitbanding capability, or how to control which virtual addresses are used. It might be possible to just rely on the feature flags requested by the importing crate, and hope that nobody turns this on for microcontrollers that don't support it. Or we could try to get some specification into the SVD files themselves (which would allow encoding the bitfields that need setting in the address, in some way, which would let the microcontrollers put bitbanding or BME somewhere else in the memory map).

WDYT?

BryanKadzban avatar Sep 03 '18 18:09 BryanKadzban

...Oh wow that example call was way off, both given the API of svd2rust and the SVD file I have for the KL03 microcontroller.

It would be more like:

Peripherals::take().TPM0.conf.set_field(|w| w.trgsel()._0001());

The conf::W struct can keep track of which bits have been modified during the callback, and the tpm0::CONF impl's set_field can use that bit range to construct a BME address for the operation.

Looking at the code more, though, I think this will also require changes to the vcell crate, to keep all the actual str/ldr code generation happening in one place. That would make the change a fair bit more complicated (as it would require changes to more than one crate), but would be more general, so I think it's probably the right way to go.

BryanKadzban avatar Sep 04 '18 05:09 BryanKadzban

One issue I see with the second proposed API (that is, getting rid of the callback) is that I don't see when the write would happen, since Drop impls aren't guaranteed to be called

That API would call immediately when you .set_bit(), just like .read() immediately reads the register. (Obviously, that API would not be suitable for the more elaborate BMEs you mentioned would allow, but it's rather aiming for the simple case).

We might need to take different approaches for the simple case and the complex one. If we introduce general .bit_set() and .bit_clear() operations, the expectation there might be that those sets happen in an atomic fashion, which the bit-banding of the EFM32 devices can't do.

As for how to get the information in, I have no plan yet; I'd rather not rely on the SVDs (hoping they will evolve into something where I can actually be generic over TIMER or so), but whatever works; my best guess right now would be an argument to svd2rust.

chrysn avatar Jan 24 '19 11:01 chrysn

the expectation there might be that those sets happen in an atomic fashion, which the bit-banding of the EFM32 devices can't do.

I have recently started working on a HAL crate for MSP432 devices; they have a similar digital I/O control register structure where a pin's management is spread across many registers with one bit per pin relevant per register. I found your bitband implementation in the EFM32 HAL (which I guess is originally @thejpster's), but it didn't work for me as-is and I did a bit of research as a result.

  1. Bit-band access is an optional feature specified as part of the Cortex M3 and Cortex M4 core architectures, so my guess it that we'd want to implement this as a feature flag similar to how interrupts are enabled in conjunction with cortex-m-rt. It can be implemented in lower-numbered (M0, M0+, etc.) Cortex processors as an AHB-Lite bus wrapper, but it's not really standard there. I think this is where e.g. NXP's BME engine came from based on the docs I found for it.

  2. It relies on features of the AHB-Lite bus (which is why it doesn't appear on Cortex M7; M7 uses a full AHB bus for better performance), namely it is required to use locked bus transfers, which guarantee an atomic read-modify-write cycle. As this is a feature specified by ARM as part of the Cortex M3/M4 architecture, I don't think it's possible (modulo bugs, of course) to say you implement bit-band access without doing it the way ARM specifies. So EFM32 (and any other chips supporting bit-banding) should provide atomic access, and at the exact same memory regions as any other Cortex M3/M4 processor that supports bit-band access.

  3. The thing that bit me: The bus access to the peripheral or SRAM region will be accessed with the same transfer width that is used in the associated bit-band region. In my case, some of the peripherals only support 8- and 16-bit transfers, while EFM32 peripherals apparently support 32-bit transfers. So it's important for the call that actually dereferences the bit-band address to do so via the right pointer type, rather than always just using a *mut u32.

Because this is actually a tightly-specified feature of Cortex-M3/M4 cores themselves and is guaranteed to perform atomic modification, I think this is an excellent candidate for integration with svd2rust, vcell, and possibly cortex-m-rt. When the feature flag is enabled, I think any single-bit operation in a bit-band region should default to a bit-band access for read, and atomic bit set/clear operations should be provided that use bit-banding rather than direct access, but they should probably not be the default as they won't necessarily do what you want with a write-to-clear register.

Manufacturer-specific features like BME would be nice as well; if they can be implemented via the same or at least similar hooks, all the better. But bit-band access seems like the first logical step.

pinealservo avatar Mar 28 '19 20:03 pinealservo

@pinealservo Thanks for your insights. I think it only makes sense to implement bit-banding at this scale if it's an intrinsic feature of the architecture one can rely on. Given that this bit-banding on Cortex-M is specific to M3 and M4 and a lot of SVD files don't even indicate the type of core, it doesn't seem to be very useful to try to automagically create some abstraction which a user cannot even rely on.

therealprof avatar Mar 29 '19 08:03 therealprof

@therealprof Hmm, that's a different reaction than I was expecting. Bit-banding is an intrinsic feature of the architecture that one can rely on when it's present; it's been present and working exactly the same on every single Cortex-M3 and Cortex-M4 part I've ever read the manuals for, and that's been a pretty large number from a variety of manufacturers. It's even integrated with the Keil C compiler such that you can pass a "--bitband" flag and it'll automatically switch every single-bit structure field access through an absolute address to a bit-band operation.

Getting efficient atomic read-modify-write access to single-bit register fields seems like a pretty big deal, especially with GPIO pin registers that need to be split between "owners", and a lot of Cortex M3/M4 register sets seem to be designed with the assumption that you'll be using bit-band accesses.

Now, granted, you won't be able to write code at the peripheral access crate level that will work the same across all Cortex-M family devices and still take advantage of bit-band operations. There's already no way to do that with things that need to be atomic. But a HAL maintainer for a particular device family could certainly make use of them where available to build safe and performant trait implementations. Right now, it's really awkward to do, since it requires bypassing most of the existing abstraction that svd2rust builds.

I'm going to continue experimenting with it anyway; hopefully it'll work out to be generally useful, but I guess we'll see!

pinealservo avatar Mar 29 '19 10:03 pinealservo

@pinealservo As you said it yourself, it's only available on M3 and M4 and actually a deprecated architectural feature and also manufacturers seem to be slowly moving aways from M3 in favour of M33.

and a lot of Cortex M3/M4 register sets seem to be designed with the assumption that you'll be using bit-band accesses.

Is that so? I have the impression that quite a few manufacturers have a large interest of keeping the peripherals (close to) identical for all their chip families which precludes requiring bit-banding for healthy access. I don't recall a single vendor not offering at least one Cortex-M0(+) based family...

Right now, it's really awkward to do, since it requires bypassing most of the existing abstraction that svd2rust builds.

I agree.

I'm going to continue experimenting with it anyway; hopefully it'll work out to be generally useful, but I guess we'll see!

Great. It's always good to see efforts making fancy features work!

therealprof avatar Mar 29 '19 10:03 therealprof

I'd like to see bit-banded APIs in svd2rust. They are used a lot in the TM4C crate - for example, there are a lot of 8-bit registers (or 8-bit fields in registers) in the GPIO registers, where each bit corresponds to an I/O pin on that port. The read/modify/write cycle to set or clear a bit takes more instructions that just directly setting or clearing the relevant bit. It's also inherently atomic, so it produces APIs that can be shared with interrupt routines (say if your ISR uses Pin A0, and your main thread uses PA1).

See https://docs.rs/tm4c123x-hal/0.10.0/tm4c123x_hal/bb/index.html

thejpster avatar Jul 19 '20 11:07 thejpster