embedded-sdmmc-rs add async support using bisync crate

This moves the existing API into a blocking module and duplicates the API into a new asynchronous module, using async fn where applicable. This uses the bisync crate rather than maybe_async used by #154. This involved moving everything in the src directory into a new src/inner directory, then exposing blocking/async versions via how they are imported in src/lib.rs.

I think this resolves the concerns brought up in #154: sync and async can both be built at the same time, usage of bisync by other crates in the dependency graph doesn't affect this, and changes to the example and test code are minimal (just adjusting import path).

I moved submodules that have no difference between blocking & async to a new common module. That code gets re-exported under the blocking/asynchronous modules in the same relative path in the module hierarchy as before. There may be opportunities to split more code into that common module, but I have tried to keep changes to a minimum for now to make it easier to review.

I have not added Cargo features for sync/async; they're both built. If requested, I could put each behind a feature gate, though I don't think it's necessary.

Fixes https://github.com/rust-embedded-community/embedded-sdmmc-rs/issues/50

Feb 06 '25 03:02 Be-ing

It might also be good to move the example import renames to another PR to keep this one cleaner so we can see precisely only that which is affected by the async additions.

Feb 06 '25 20:02 thejpster

It might also be good to move the example import renames to another PR to keep this one cleaner so we can see precisely only that which is affected by the async additions.

That goes towards a bigger question: do you want to break the public API by moving the existing API into a new blocking module? This can be avoided by making the blocking module private and instead putting pub use blocking::*; in src/lib.rs. I don't think that would be great for downstream users in the long term though. My primary concern is that it would make the documentation confusing to navigate. Here's how the front page of the documentation would look with that:

There's the whole blocking API of the crate, then the asynchronous module... and when you go to the documentation for that, you see the same API:

I think this would be rather confusing. It's difficult to even tell that I clicked a link and navigated to another page because it looks so similar to the documentation's front page.

By moving the existing API into a blocking module (how this branch currently is), the front page of the documentation looks like this:

Feb 06 '25 20:02 Be-ing

I've avoided running cargo fmt for now to minimize the diff because it's rather big already. If you'd like that to be run now, just let me know, or I could do it later after you've reviewed.

Feb 06 '25 20:02 Be-ing

I had to use the #[only_sync] macro in a few places. Those are the impl Drop for File and Volume; unfortunately, there is no such thing (yet) in Rust as an async drop, so that's the best we can do. VolumeManager::close_file and VolumeManager::close_file can be invoked manually though. I presume bad things would happen if those aren't closed manually? If so, I can add a note to the documentation that has to be done manually for the async versions.

The other places I used #[only_sync] were just to make the tests simple to implement.

Feb 06 '25 20:02 Be-ing

I presume bad things would happen if those aren't closed manually?

Yes, leaking a file handle means you can't reopen it, and handles are a finite resource. Plus certain context is only written to disk on file close so the disk ends up needing a fsck/scandisk pass.

As long as that only affects the async users that's fine.

Moving the blocking API into the blocking module is fine with me.

Feb 07 '25 07:02 thejpster

Can you rebase? #177 might make this smaller.

Feb 09 '25 17:02 thejpster

Is it possible to give the async version a blocking drop? Better than doing nothing.

Feb 09 '25 17:02 thejpster

Can you rebase? https://github.com/rust-embedded-community/embedded-sdmmc-rs/pull/177 might make this smaller.

Okay, rebased.

Is it possible to give the async version a blocking drop? Better than doing nothing.

Unfortunately not. That would require this crate to bring in its own async executor and set that up just within impl Drop. It's questionable if that would be desirable if it was even possible. In a library that uses std, you can wrap async functions with a simple async executor like futures_lite::future::block_on or pollster::block_on, but both of those use std modules like std::future and std::sync that aren't available in core.

Feb 12 '25 07:02 Be-ing

Is there no no_std block_on! ? Can't you just spin polling the Future?

Feb 12 '25 12:02 thejpster

I just found embassy_futures::block_on, however, I don't think it's a good idea to use it to impl Drop for async Files/Volumes. We can't know if a user would want that and there's no way to opt out of Drop::drop getting called if it's implemented. For example, the user could have strict power consumption requirements that require putting the MCU to sleep on every .await as Embassy and RTIC do.

Feb 12 '25 18:02 Be-ing

But it's probably better than a corrupt file system, and any correct program will close them asynchronously before dropping.

What do other async File APIs do?

Feb 12 '25 21:02 thejpster

tokio::fs::File:

A file will not be closed immediately when it goes out of scope if there are any IO operations that have not yet completed. To ensure that a file is closed immediately when it is dropped, you should call flush before dropping it. Note that this does not ensure that the file has been fully written to disk; the operating system might keep the changes around in an in-memory buffer. See the sync_all method for telling the OS to write the data to disk.

async-std::fs::File:

Files are automatically closed when they get dropped and any errors detected on closing are ignored. Use the sync_all method before dropping a file if such errors need to be handled.

impl Drop for File in async-std:

impl Drop for File {
    fn drop(&mut self) {
        // We need to flush the file on drop. Unfortunately, that is not possible to do in a
        // non-blocking fashion, but our only other option here is losing data remaining in the
        // write cache. Good task schedulers should be resilient to occasional blocking hiccups in
        // file destructors so we don't expect this to be a common problem in practice.
        let _ = futures_lite::future::block_on(self.flush());
    }
}

Suggestion: impl Drop for async File/Volume using embassy_futures::block_on, but put that behind an async_drop Cargo feature that is enabled by default. This will Just Work by default, and any project where that little bit of blocking is really a problem has an escape hatch.

Feb 12 '25 22:02 Be-ing

Suggestion: impl Drop for async File/Volume using embassy_futures::block_on, but put that behind an async_drop Cargo feature that is enabled by default. This will Just Work by default, and any project where that little bit of blocking is really a problem has an escape hatch.

Nevermind, I forgot core::mem::forget exists to opt out of Drop, and of course File/Volume::close already use it. There's no need for a Cargo feature to prevent dropping; that can be opted out of just by calling File/Volume::close.

Feb 12 '25 23:02 Be-ing

Some blogs about async drop: https://without.boats/blog/asynchronous-clean-up/ https://sabrinajewson.org/blog/async-drop https://smallcultfollowing.com/babysteps/blog/2023/03/16/must-move-types/

TL;DR: It's complicated and Rust is a long way from async drop being implemented. Probably not for some more years.

Blocking in Drop is an ugly workaround, but I think it's sufficient for now.

Feb 15 '25 05:02 Be-ing

You'll need to handle dropping Directories too.

Feb 15 '25 13:02 thejpster

Directory::drop doesn't make any async calls, so there's nothing to do there. The existing implementation is already enough.

Feb 15 '25 19:02 Be-ing

Ah! Yes you're right.

Feb 15 '25 22:02 thejpster

Ok, I need to find time to test this!

Feb 15 '25 22:02 thejpster

Still on my list - maybe after Embedded World

Mar 02 '25 19:03 thejpster

OK, this looks fine. @Be-ing, if you can run a cargo fmt and rebase, let's get it done.

Mar 16 '25 13:03 thejpster

I just ran into a panic while using this branch when drop is called on an open volume. I'm using the embassy executor.

When drop is called I get this panic message

0.052407 ERROR panicked at 'Found waker not created by the Embassy executor. `embassy_time::Timer` only works with the Embassy executor.'
└─ embassy_executor::raw::waker::task_from_waker @ /home/alexander/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/embassy-executor-0.7.0/src/raw/waker.rs:38  
0.052459 ERROR panicked at /home/alexander/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/defmt-0.3.10/src/lib.rs:380:5:
explicit panic
└─ panic_probe::print_defmt::print @ /home/alexander/.cargo/registry/src/index.crates.io-1949cf8c6b5b557f/panic-probe-0.3.2/src/lib.rs:104

Calling block_on from within an embassy executor is allowed by itself, so there must be something within the code that's called from drop that causes this. Or it's a bug in the embassy executor. The panic message is not very helpful since this crate doesn't depend on embassy-time.

I didn't dig any deeper yet but I created this repo that reproduces the problem: https://github.com/avsaase/embedded-sdmmc-async-drop.

Mar 20 '25 21:03 avsaase

This is a known issue if I remember correctly, it's intended behavior and the code from the user main should use the feature generic-queue-x. This is documented in the embassy-time documentation. It is maybe useful to add a note for that on the README and docs, but I'm not sure if this library should provide additional feature flags or other stuff to manage that.

Mar 20 '25 23:03 ValouBambou

Okay, so what happens is that the async DelayNs implementation of embassy_time::Delay awaits a timer which causes a panic when run in embassy_futures::block_on if generic-queue-x is not enabled. It would definitely be good to document this somewhere but I wonder if it's possible to avoid the problem altogether. One possible solution is to require users of the async api to pass in an implementer of both embedded_hal::delay::DelayNs and embedded_hal_async::delay::DelayNs and use the blocking DelayNs in the Drop impl. I haven't checked yet how pervasive this change would be in this code base. I can have a try over the next few days but it would be nice to hear from @thejpster and @Be-ing if they think this is good approach before I spent half my weekend on it.

Mar 21 '25 09:03 avsaase

Thanks for testing and digging into this @avsaase and @ValouBambou! I documented the requirement for enabling a generic-queue-x feature on the embassy-time crate when dropping a File or Volume from an Embassy executor.

One possible solution is to require users of the async api to pass in an implementer of both embedded_hal::delay::DelayNs and embedded_hal_async::delay::DelayNs and use the blocking DelayNs in the Drop impl. I haven't checked yet how pervasive this change would be in this code base. I can have a try over the next few days but it would be nice to hear from @thejpster and @Be-ing if they think this is good approach before I spent half my weekend on it.

That's an interesting idea, but I suspect it would be cumbersome for users, but perhaps more importantly, cumbersome to maintain. It's difficult to say whether it would be worthwhile and how pervasive the code changes would need to be before it's implemented. I suspect it wouldn't be worth the trouble because the issue is avoidable by manually calling close instead of letting the object drop. Ironically, the panic could be helpful to ensure that the File/Volume is not dropped accidentally instead of calling close.

Mar 23 '25 22:03 Be-ing

I'm trying to get this working with Embassy on STM32F3 (STM32F3DISCOVERY with STM32F3VCT6) with a micro-SD card breakout board, but I'm having trouble with the basics (num_bytes() returns CardNotFound error). Here's my example repo: https://github.com/Phirks/embassy-stm32f3-sd-dfu-example

When I check the communication, I'm sending 80 clock pulses then repeatedly CMD0 and getting back 0xFE and 0x07 over and over until it times out and sends a CardNotFound error. It looks like it's expecting a 0x01. I've tried setting the retries to 50,000 and it never works. I've tried a few different sd cards of 2 different brands. When I pull the card I just get 0xFF forever, which makes sense, and when I put a new card in, sometimes I get 0xFF for a while until it starts returning 0xFE and 0x07 again. SD card response

I believe everything is set up here according to the docs. I had heard that clock cycles starting before the CS pin is set and remaining on after would be helpful, but I don't see how that would be controllable by a user with this interface.

Mar 24 '25 04:03 Nicholas-L-Johnson

That's an interesting idea, but I suspect it would be cumbersome for users, but perhaps more importantly, cumbersome to maintain. It's difficult to say whether it would be worthwhile and how pervasive the code changes would need to be before it's implemented. I suspect it wouldn't be worth the trouble because the issue is avoidable by manually calling close instead of letting the object drop. Ironically, the panic could be helpful to ensure that the File/Volume is not dropped accidentally instead of calling close.

I had a look though the code to see how this could be implemented and I have to say I agree, it's not a good idea. You would need to encode if each function is called for regular use or from Drop, and pass this along with every function call. Not nice.

The non-descriptive panic message from embassy-time is a bit unfortunate but not something that can be fixed here.

Mar 24 '25 15:03 avsaase

but I don't see how that would be controllable by a user with this interface.

~~It's not. You have to drive the SPI bus manually to do this. Some cards let you get away without it but it's in the spec and some cards require it. We can't do it for you without violating embedded-hal API design and we got told off for doing that.~~

Ok, I misunderstood. I looked at your code and it seems OK.

Can you try:

using the previous release of this crate and its blocking API
Using micropython or some other well-known SD card implementation to rule out wiring errors or electrical problems.

Mar 26 '25 07:03 thejpster

I wanted to update that I got this async working on an ESP32-S3 based board - WT32-SC01-Plus. I've been using the standard crate for quite some time but the async was really missing. So thanks for the hard work getting this Async done!

Along the way I had some issue that I'd like to understand before I fully switch to using Async.

When I started working, the Blocking version of my code worked fine and the Async failed with the following log:

WARN - Got response: 52, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..

> this went on about 50 lines

And then I receive an error:

====================== PANIC ======================
panicked at src/main.rs:142:72:
called `Result::unwrap()` on an `Err` value: DeviceError(CardNotFound)

Backtrace:

0x4200a2c9
0x4200a2c9 - esp32s3_async_sdmmc::____embassy_main_task::{{closure}}
    at ??:??
0x4200621a
0x4200621a - embassy_executor::raw::TaskStorage<F>::poll
    at /Users/yanshay/.cargo/registry/src/index.crates.io-6f17d22bba15001f/embassy-executor-0.7.0/src/raw/mod.rs:214
0x4200615d
0x4200615d - embassy_executor::raw::SyncExecutor::poll::{{closure}}
    at /Users/yanshay/.cargo/registry/src/index.crates.io-6f17d22bba15001f/embassy-executor-0.7.0/src/raw/mod.rs:430
0x4200d363
0x4200d363 - esp32s3_async_sdmmc::__xtensa_lx_rt_main
    at /Users/yanshay/Trials/esp32/esp32s3-async-sdmmc/src/main.rs:65
0x42013dd6
0x42013dd6 - Reset
    at /Users/yanshay/.cargo/registry/src/index.crates.io-6f17d22bba15001f/xtensa-lx-rt-0.18.0/src/lib.rs:82
0x40378c48
0x40378c48 - ESP32Reset
    at /Users/yanshay/.cargo/registry/src/index.crates.io-6f17d22bba15001f/esp-hal-0.23.1/src/soc/esp32s3/mod.rs:165

Sometimes it did work, usually if I ran the blocking version before, and only after that the async version if I didn't power the device down and only flashed the async version code (so the sync initialization seem to have initialized the device properly for the async to work). At least so it seemed, I made many runs so things weren't consistent.

Here is an example of the log when the file read succeeded, and when successful the sequence is always showed the exact same logs:

WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 70, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 0, trying again..
WARN - Got response: 55, trying again..
WARN - Got response: 8, trying again..
WARN - Got response: 3f, trying again..
INFO - File content: This is content of file a.txt

And when it worked, I later had an embassy panic regarding waker that's not an embassy-executor waker. That was until I realized I need to call close().await. The panic was on the Volume drop.

Then I made some changes, I'm not sure what, but they weren't that significant for initialization, and I think the only significant change is that I noticed I need to do close().await on the objects so not get that embassy panic.

I think that's when things started working fine and the above logs also disappeared and things started to work consistently. But I only think that, because I did all kinds of minor changes to the code while trying to get it to work.

So my question is - could it be that not closing all objects properly using close().await caused some inconsistent state with the SDCard that would appear also in the initialization of the next run if device wasn't not turned off / or even if it was turned off? I'm just trying to figure out if there's some reliability issue under certain circumstances and want to understand what caused the many failed runs above along the way before I fully switch to using async mode.

Mar 28 '25 11:03 yanshay

That's a pretty bad failure mode and makes me wonder if we should have the option to Drop File and Volume. We could panic in those Drop impls with a helpful error messages saying to use close().await instead. Maybe we could make that the default Drop impl and add a Cargo feature embassy-time-generic-queue to opt into the block_on impls of Drop?

Thanks for all the testing and feedback everyone, it's is very useful!

Mar 28 '25 17:03 Be-ing

That's a pretty bad failure mode and makes me wonder if we should have the option to Drop File and Volume. We could panic in those Drop impls with a helpful error messages saying to use close().await instead. Maybe we could make that the default Drop impl and add a Cargo feature embassy-time-generic-queue to opt into the block_on impls of Drop?

Thanks for all the testing and feedback everyone, it's is very useful!

That would have definitely been helpful and save quite some time. I would have never thought of the close if I didn't go through this discussion before and even with that I initially missed the await, only the warning made me notice that.

Other than that, could missing the proper close be the cause for the failures I saw about CardNotFound? The drops come later so I wonder if the failures were due to not closing properly on earlier executions but I think it took place also right after power up of the device, only it doesn't any longer.

Mar 28 '25 17:03 yanshay