embedded-sdmmc-rs icon indicating copy to clipboard operation
embedded-sdmmc-rs copied to clipboard

Add support for multiple contiguous-block reads, without the extra overhead of additional `data-copying`

Open nihalpasham opened this issue 3 years ago • 11 comments

Added a method named read_multi - reads multiple contiguous blocks of a file in a single read operation, without the extra overhead of additional data-copying of the existing read method.

It has the same effect as [Controller<D, T>::read] but reduces read time by more than 50%, especially in the case of large files (i.e. > 1Mb)

nihalpasham avatar Jan 13 '22 08:01 nihalpasham

This is an interesting enhancement. However, it requires the slice_as_chunks feature so it requires Rust nightly. It is a big performance boost, though, so maybe it is worth putting behind a nightly feature.

Ok. How do I conditionally add unstable features to a stable build? Is there a standard way to do this.

nihalpasham avatar Feb 04 '22 10:02 nihalpasham

That is not possible, nor should it. However, we can define a new feature like unstable (sounds better than nightly) or a similar name, that enables features only available on Rust nightly and thus can only be enabled when using it or it will fail to compile. If/when the feature makes it to stable, we can make it just a no-op. You can do that with:

#![cfg_attr(feature = "unstable", feature(slice_as_chunks))]

This feature would also need to be documented both in the docs and in the README.

eldruin avatar Feb 04 '22 12:02 eldruin

Ok, I've added the following

  • a new feature named unstable (along with the cfg_attr attribute).
  • README documentation

It now compiles on stable and nightly (with the unstable feature-flag set). Although, I'm a little unclear as to why this will be a no-op when the feature makes it to stable. The unstable feature-flag that decorates read_multi would still need to be removed before we use it in stable-rust -right?

PS: I noticed rust-analyzer has issues when setting

#![cfg_attr(feature = "unstable", feature(slice_as_chunks))]

i.e. go to definitions, references within read_multi don't work. I am assuming this is a rust-analyzer bug.

nihalpasham avatar Feb 04 '22 13:02 nihalpasham

Looks good. The CI here needs to be updated to test this feature on rust nightly, though. Would you be able to do that as well? (Also, there is a bit of stray whitespace, which is making the build fail) I mean if/when this feature is available on rust stable we can make a new non-breaking release where we make the unstable feature just do nothing and we make read_multi simply available to everybody. I assume what you described is a rust-analyzer problem as well.

eldruin avatar Feb 04 '22 14:02 eldruin

Fixed the formatting error. Added an additional build step (named Nightly Build) to the existing build-workflow. I hope I got this right (never done this before).

nihalpasham avatar Feb 05 '22 05:02 nihalpasham

Kind of forgot about this. Is this good to go?

nihalpasham avatar Mar 03 '22 03:03 nihalpasham

Can we emulate 'read_multiple' on the block device trait, thus avoiding the need for a feature flag? You could have a default method on the trait which calls the single-block-read function in a loop and memcpy's the data into the given buffer.

thejpster avatar Mar 04 '22 10:03 thejpster

Assuming I understood your question correctly, emulating read_multi by looping over multiple single_block_read calls technically works but then we don't get the performance

  • as hardware (i.e. block-device controller) will just end up issuing multiple single-read calls.
  • and we still have to deal with additional data-copying. Here's an example, that avoids additional data-copies.

The extra feature-flag is only required to convert a [[u8; 512]] slice into a slice of Block(s). I used a nightly only feature here but its possible to implement this ourselves.

nihalpasham avatar Mar 04 '22 11:03 nihalpasham

But you would get the performance if the BlockDevice implementation replaced the default read_multiple implementation with a high-performance version. And a user of the BlockDevice trait (i.e. the FAT filesystem code) wouldn't have to know or care about that.

thejpster avatar Mar 08 '22 15:03 thejpster

So,

  • yes, a block device implementation can add support for reading multiple-blocks, if the underlying hardware supports it and it is usually decoupled from the filesystem.
  • but for it to be performant i.e. make use of the multiple block-read functionality, its going to need a higher layer (such as the filesystem) to tell it which set of contiguous blocks are to be read.
  • If this info is not provided, then a block device implementation usually falls back to single block reads (much like the current implementation of [Controller<D, T>::read])

nihalpasham avatar Mar 08 '22 17:03 nihalpasham

So my thinking is:

  1. The file system always asks for contiguous blocks where it can.
  2. The block device provides an accelerated multi-block-read function where possible.
  3. The block device trait provides a default implmentation of the multi-block-read function which just calls the single-block-read function in a loop.

The same applies to writes.

thejpster avatar Mar 09 '22 09:03 thejpster

I'm going to close this, because we didn't get it out of WIP and merged before a whole bunch of stuff happened. Sorry, that's my fault. Feel free to rebase on current head and open a new one.

thejpster avatar Jun 11 '23 09:06 thejpster

I do want to revisit this though, because I think it is useful to be able to pull larger blocks from the transport in one go.

thejpster avatar Jun 11 '23 09:06 thejpster