svd2rust icon indicating copy to clipboard operation
svd2rust copied to clipboard

proposal: separate crates for peripherals

Open rmsyn opened this issue 11 months ago • 18 comments

Split peripheral modules into crates

This proposal is to add a CLI option to split peripheral modules into separate crates in a project workspace.

Motivation

For large PAC crates, compilation on machines with less-powerful cores is excruciatingly slow. The bottleneck for any projects depending on the PAC crate is obviously the PAC, because one core is pinged to full usage, while all other cores are idle.

By splitting the PAC into separate crates for each peripheral, each core could work on a separate peripheral crate. The separate crates could also improve incremental compilation, since a change in one of the peripheral crates would not necessarily require recompilation of other peripheral crates.

Challenges

Much of the plumbing is already there for creating separate crates for each peripheral.

Each peripheral already generates its own module.

Peripheral versioning would inherit from the top-level Cargo.toml version.

The main challenge will be making minimal changes to enable the code generation, without adding too much to maintainability burden.

Example

An example project structure might look like:

- Cargo.toml
- <soc>-generic
  - Cargo.toml
  - src
    - lib.rs
- <soc>-i2c
  - Cargo.toml
  - src
    - lib.rs
    - i2c0
    - i2c1
    - ...
    - i2c<N>
- <soc>-spi
  - Cargo.toml
  - src
    - lib.rs
    - spi0
    - spi1
    - ...
    - spi<N>

Alternative approaches

Once the parallel front-end is stable in rustc, the main motivation for this proposal may be less relevant.

However, there will still be users stuck on older versions of rustc where this proposal will be useful.

rmsyn avatar Jan 28 '25 04:01 rmsyn

I like the idea. I think it is last chunk of the puzzle. But it requires more changes then an option.

  1. generic` part must be moved to independent module. (what name to use?). This also makes many other things easier.
  2. Finish this https://github.com/rust-embedded/svd2rust/pull/430

Other notes:

  1. groupName should be used for crate names and existing "feature-group" option.
  2. For folder names hash values could be used instead of peripheral names. With #[path="hash"] mod spi1 attribute. Like in https://github.com/djmcgill/form/pull/24. Advantages:
  • Allows to reuse identical peripherals across different chips.
  • You can skip writting (or even generating if hash is calculated from SVD directly) peripheral if folder already exists.

burrbull avatar Jan 28 '25 06:01 burrbull

So I've tried to implement 1 step but understood that I don't know how to do this. There are 2 issues:

  1. Encapsulation. Some methods of common structures are made private to prevent manual creating. The only solution is to just hide them from docs.
  2. It uses type aliases so you can't add implementation from "outside" (without traits). It can be solved by using "newtypes", which have also other advantages but it creating many new types overloads compiler and it much slower (3 times and bigger). See version 0.14 and previous.

But without this change splitting on different crates makes much less sense as each one will have it's own copy of register access implementation.

I still like the idea as it gives possibility to create and publish independent crates much easier so HALs can reuse them. Like https://github.com/stm32-rs/bxcan

Now to performance issues. I'd want to see on SVDs you are using to generate rust. The best way to improve performance is to reduce of code. You can do it by using register arrays, clusters and by using derivedFrom.

burrbull avatar Jan 29 '25 03:01 burrbull

creating many new types overloads compiler and it much slower (3 times and bigger).

Wow, that is definitely the opposite of the goal of this proposal. Thank you for doing the experiment.

I still like the idea as it gives possibility to create and publish independent crates much easier so HALs can reuse them

I had a similar thought when you brought up the module hashing, like the peripheral crates could be reused by other PACs and HALs.

That may also require parameterizing the base address in some way (either on a Peripheral::new(base_addr: usize), Peripheral<const BASE: usize>, etc.).

The best way to improve performance is to reduce of code

This was also brought up in the meeting yesterday, and it certainly seems like one of the first, most straightforward steps to take in reducing compile times (especially when one has full control over the SVD).

However, I'm already heavily utilizing as much code re-use as possible. There is maybe a bit more re-use I could gain with the SVD attribute for saying a non-contiguous peripheral is based on another definition. For example, there are peripherals that are based on the same IP, but are not in contiguous memory.

I'd want to see on SVDs you are using to generate rust.

The main PAC that I maintain is jh71xx-pac, and I use svd-generator to generate the SVD (the latter is where I could probably make some additional optimizations for code re-use).

Here is a link to the jh71xx-pac SVD

rmsyn avatar Jan 29 '25 06:01 rmsyn

For example, there are peripherals that are based on the same IP, but are not in contiguous memory.

Yes. This is the first you should fix.

For example UART1 should looks like

    <peripheral derivedFrom="uart0">
      <name>uart1</name>
      <groupName>UART</groupName>
      <description>Synopsys DesignWare APB UART: uart1</description>
      <baseAddress>0x10010000</baseAddress>
      <addressBlock>
        <offset>0x0</offset>
        <size>0x10000</size>
        <usage>registers</usage>
      </addressBlock>
      <interrupt>
        <name>UART1</name>
        <value>33</value>
      </interrupt>
    </peripheral>

No registers block. But derivedFrom attribute. Also I recommend to add groupName for all peripherals which specifies peripheral type.

Besides code reduce this also gives you possibility to write generic (over peripheral) code in your HAL instead of macros.

P.S. You could also try to use patched version of form I mentioned above. But this is useful only to reduce size of code across chips. As it will not give you faster compilation but only less of code.

burrbull avatar Jan 29 '25 06:01 burrbull

No registers block. But derivedFrom attribute. Also I recommend to add groupName for all peripherals which specifies peripheral type.

So, I followed your advice, and applied the derivedFrom attribute where applicable.

It shaved off ~72k LoC, and ~17-20 seconds from CI compile time!

Going to try it on a low-resource, multicore SBC now.

This is encouraging. I think if we can further improve with parallelizing the PAC build as much as possible, it will greatly improve compile times.

Locally, there is still a period where there seems to be a single core pinged with other idle cores.

Edit: attempted the build with reduced code on the low-resource SBC. There was a small time-savings, but the debug build still takes ~3min with the majority spent in the final stage compiling jh71xx-pac itself. Again, it is only using a single core at full use, with other cores remaining idle.

Going to try splitting the crate manually to see how much time saving there is.

rmsyn avatar Jan 31 '25 05:01 rmsyn

Reposting a comment I left in the rust-embedded Matrix channel for tracking purposes:

So, I tried a bit of an experiment manually splitting out the peripheral modules into crates.

From the bit of work done, it requires splitting out the generic module into its own crate, and importing it into the peripheral crate(s).

The R and W types can no longer just be aliased (since they now come from an external crate).

Instead, the aliases turn into wrapper types:

pub struct R(generic::R);
pub struct W(generic::W);

impl R {
    pub const fn field(&self) -> FieldR {
        FieldR::new(self.0.bits() & field_mask)
    }
}

impl W {
    pub fn field(&mut self) -> FieldW<FieldSpec> {
        FieldW::new(self, 0)
    }
}

Going to take a bit to do that programmatically in svd2rust and form.

form is likely going to need an option to generate new crates, instead of just modules. It may even require bringing the form functionality internal to svd2rust.

It's not clear at the moment, but one potential option would be to define the peripherals with the following pseudo-Rust:

pub extern crate some_peripheral {
     // alll the peripheral code that would normally go in the module
}

Then, after the form processing, each crate with the above markup would go into its own top-level crate directory. The entry in the main crate would change to:

pub extern crate some_peripheral;

rmsyn avatar Feb 08 '25 01:02 rmsyn

Instead, the aliases turn into wrapper types:

Have you tried to implement it? That is the question.

How do you suggest to avoid code duplication? Of each bits, set_bit, etc.

burrbull avatar Feb 08 '25 04:02 burrbull

Have you tried to implement it? That is the question.

Yes, but only had success with a manual implementation of a single small peripheral (Clint).

I tried some additional experiments with programmatic implementation in svd2rust, and that's where the idea to modify form came up.

How do you suggest to avoid code duplication? Of each bits, set_bit, etc.

That is definitely an issue. It may require a complete restructuring of the generic types into traits, and/or using macros.

Through these experiments, I keep wondering about defining PACs per-peripheral instead of at the SoC level. I know this was not the original intention of SVD, but peripherals are shared across a number of different SoCs. The only thing that may change is the base address of the peripheral, and/or some other attributes dynamically defined by register values (which is not expressible in SVD).

If we could dynamically define the base address of the peripheral, the generated PAC could be shared across different SoCs that share the same peripheral.

It would also allow writing a HAL to the peripheral, a driver, etc. The HAL for a SoC would then be composed of HALs for each of its peripherals.

What do you think? Maybe there is a better place for this discussion?

rmsyn avatar Feb 08 '25 23:02 rmsyn

or using macros

If you are saying about declarative macros that is what stm32ral exactly does. If you are saying about procedural macros

Also there is chiptool which does not use write proxies. Instead of this it defines all field access variants directly on register writers. Try it, maybe it is better for your use cases.

The only thing that may change is the base address of the peripheral, and/or some other attributes dynamically defined by register values (which is not expressible in SVD).

How much simpler everything would be if this were true.

What do you think? Maybe there is a better place for this discussion?

I don't mind it.

burrbull avatar Feb 09 '25 05:02 burrbull

If you are saying about declarative macros that is what stm32ral exactly does. If you are saying about procedural macros

I'll keep looking at stm32ral, but I don't really think the same approach would work for the wrapped R + W types.

I am thinking more of a macro that handles the boilerplate for accessing the wrapped inner R + W types. I can work on getting a working example in code to clarify.

Also there is chiptool which does not use write proxies.

Right, I've considered using chiptool, but I think it also generates one large crate with all of the peripherals. So, it wouldn't really help with the overall build-time issues (unless there is some other magic they do to parallelize builds).

How much simpler everything would be if this were true.

Yeah... I opened an issue with the upstream CMSIS-SVD spec repository about expanding the spec to handle dynamically defining base addresses. Almost a year, no response...

Or, are you saying that peripherals on different SoCs are not interchangeable, even if they are based on the same IP (read same compatible string in a DeviceTree)?

rmsyn avatar Feb 11 '25 03:02 rmsyn

I can imaging only Reg part to be made generic without total reorganization. Approximately one third of generic.rs

See https://github.com/burrbull/reg

Other things like field/register readers and writers should be copied anyway.

burrbull avatar Feb 11 '25 06:02 burrbull

If we could dynamically define the base address of the peripheral, the generated PAC could be shared across different SoCs that share the same peripheral.

It would also allow writing a HAL to the peripheral, a driver, etc. The HAL for a SoC would then be composed of HALs for each of its peripherals.

What do you think? Maybe there is a better place for this discussion?

Maybe this would also be a viable new issue, but I think this is super interesting for a project I am currently planning. I am trying to create basic PACs (and maybe HAL if that works?) for the the Xilinx Zynq 7000 SoC. The Xilinx SoCs are hybrid systems with both ARM cores and FPGAs on the same fabric. It is very common on these systems to dynamically create hardware on the FPGA fabric like UART or SPI softcores through IP designs. The IP generally also has registers documented in its respective documentation, and the softcores have a defined register base address and are mapped into the project AXI bus directly. That means that I would also have to generate some kind of dynamic PAC, in addition to the fixed PAC for the ARM cores.. I have not looked at this in more detail, but some thoughts about this:

  • I want to somehow provide something like a library/crate for a specific UART softcore, for example https://www.xilinx.com/products/intellectual-property/axi_uart16550.html#overview
  • I want to provide an API like the one specified here: https://docs.rs/svd2rust/latest/svd2rust/#peripheral-api
  • The softcores have a dynamic base address. svd2rust mostly has to use/process/geenerate code for fixed base addresses of peripheral blocks if I understand correctly.

If a user now wants to use Rust to drive the softcore, they would have to somehow instantiate the peripheral API on a dynamic base address I mentioned earlier because those addresses are configurable via the FPGA design. Maybe I could at least re-use https://github.com/rust-embedded/svd2rust/blob/master/src/generate/generic.rs for first experiments?

robamu avatar Feb 17 '25 23:02 robamu

Maybe this would also be a viable new issue, but I think this is super interesting for a project I am currently planning.

I was thinking the same thing, dynamic peripheral base addresses are related, but separate from splitting peripherals into their own crates.

If @burrbull thinks it deserves its own issue, I can open a new issue.

If a user now wants to use Rust to drive the softcore, they would have to somehow instantiate the peripheral API on a dynamic base address I mentioned earlier because those addresses are configurable via the FPGA design.

Sounds like another great use-case for a configurable peripheral base address. The main issue I see, it would be a svd2rust-specific extension of the CMSIS-SVD spec. Maybe someone with a bit more clout in the ARM space could request feedback on adding it to the upstream specification?

Otherwise, I'm not sure how much of a blocker having a svd2rust-specific extension would be? I think if we document whatever syntax we use in the SVD to indicate the extension, it should make it easy enough for maintainers, and any other projects that may want to pick it up, e.g. other svd2<some-lang> codegen projects.

rmsyn avatar Feb 18 '25 13:02 rmsyn

@robamu when you say "dynamically" does it mean the address is not known on compile time or just that it can take any value?

That may also require parameterizing the base address in some way (either on a Peripheral::new(base_addr: usize), Peripheral<const BASE: usize>, etc.).

One of those? See also https://github.com/rust-embedded/svd2rust/pull/910

Sounds like another great use-case for a configurable peripheral base address. The main issue I see, it would be a svd2rust-specific extension of the CMSIS-SVD spec. Maybe someone with a bit more clout in the ARM space could request feedback on adding it to the upstream specification?

What kind of extension do you mean? How should it look like?

burrbull avatar Feb 18 '25 14:02 burrbull

@robamu when you say "dynamically" does it mean the address is not known on compile time or just that it can take any value?

The value would be known compile time when generating / writing / using the dynamic PAC. The user can edit the address in the FPGA design, but usually, one will just take an auto-assigned address and it will generally stay fixed after that.

robamu avatar Feb 18 '25 14:02 robamu

What kind of extension do you mean? How should it look like?

In code, it could look something like:

pub struct Uart<const BASE: usize> {
    _marker: PhantomData<*const uart::RegisterBlock>,
}

impl<const BASE: usize> Uart<BASE> {
    pub const PTR: *const uart::RegisterBlock = BASE as *const _;
    // ...
}

This is very close to your implementation in #910, so using your generic peripheral:

pub type Uart<const BASE: usize> =  Periph<uart::RegisterBlock, BASE>;

In the SVD, it could look something like:

<peripheral>
  <name>uart</name>
    <baseAddressDynamic>0x01000000</baseAddressDynamic>
    <!-- ... -->
</peripheral>

I'm not tied to the name of the baseAddressDynamic attribute, or if we want to provide a default address value there.

If we supplied a default value, that would allow for a default instantiation which we could alias to look like the current API.

However, with the dynamic API, users could define their own implementation.

We could also introduce an alignment attribute, to ensure we could enforce that invariant in the generic structure.

<peripheral>
  <name>uart</name>
    <baseAddressDynamic>0x01000000</baseAddressDynamic>
    <baseAddressAlign>0x4</baseAddressAlign>
    <!-- ... -->
</peripheral>

pub struct Uart<const BASE: usize> {
    _marker: PhantomData<*const uart::RegisterBlock>,
}

impl<const BASE: usize> Uart<BASE> {
    pub const PTR: *const uart::RegisterBlock = BASE as *const _;
    pub const _ALIGN: () = assert_eq!(BASE % 0x4, 0);
    // ...
}

pub type Uart0 = Uart<0x0100_0000>;

With your implementation in #910, there may be no need for a SVD baseAddressDynamic field.

Then, the user is free to define their own type aliases.

rmsyn avatar Feb 19 '25 02:02 rmsyn

@burrbull looks like we have another example in a similar context (Rust codegen from SQL), of using separate crates drastically decreasing compile time:

https://www.feldera.com/blog/cutting-down-rust-compile-times-from-30-to-2-minutes-with-one-thousand-crates

rmsyn avatar Apr 16 '25 11:04 rmsyn

This would probably be benefitial to this proposal: https://github.com/rust-lang/rust/issues/122349

I also remember some discussion about allowing multiple compilation units with crate sub_crate_name { ... } but can't seem to find it

Emilgardis avatar Apr 16 '25 12:04 Emilgardis