ch32-hal icon indicating copy to clipboard operation
ch32-hal copied to clipboard

CH32V003F4P6 `embassy_blinky` example works briefly then appears to hang

Open DazWilkin opened this issue 2 months ago • 9 comments

Thank you for developing this HAL!

Rust and CH32V003 noob so -- feel free -- to tell me to go away.

I've a CH32V003F4P6-R0-1v1 and a WCH-LinkE-RO-1v3.

I'm exploring the examples.

blinky works and embassy_blinky works but only briefly (and I'm unsure how to debug).

// PC4 is connected to LED2
spawner.spawn(blink(p.PC4.degrade(), 250)).unwrap();
// PD6 is connected to LED1
spawner.spawn(blink(p.PD6.degrade(), 333)).unwrap();

I've tried with Rust 2021 and 2024.

I do need to uncomment/enable Tlink.x for successful compilation:

.cargo/config.,toml:

[target."riscv32ec-unknown-none-elf"]
rustflags = [
    "-C", "link-arg=-Tlink.x",
]
cargo run --bin=embassy --release
    Finished `release` profile [optimized] target(s) in 0.13s
     Running `wlink -v flash --enable-sdi-print --watch-serial target/riscv32ec-unknown-none-elf/release/embassy`
21:38:42 [DEBUG] (1) wlink::usb_device::libusb: Serial number: "0D548F069286"
21:38:42 [INFO] Connected to WCH-Link v2.18(v38) (WCH-LinkE-CH32V305)
21:38:42 [INFO] Attached chip: CH32V003 [CH32V003F4P6] (ChipID: 0x00300510)
21:38:42 [INFO] Chip ESIG: FlashSize(16KB) UID(cd-ab-70-95-2e-bd-6b-fe)
21:38:42 [INFO] Flash protected: false
21:38:42 [INFO] Read target/riscv32ec-unknown-none-elf/release/embassy as ELF format
21:38:42 [DEBUG] (1) wlink::firmware: Found loadable segment, physical address: 0x00000000, virtual address: 0x00000000, flags: 0x5
21:38:42 [DEBUG] (1) wlink::firmware: Matching section: ".vector_table" offset: 0x1000 size: 0x320
21:38:42 [DEBUG] (1) wlink::firmware: Matching section: ".text" offset: 0x1320 size: 0x277a
21:38:42 [DEBUG] (1) wlink::firmware: Section names: [".vector_table", ".text"]
21:38:42 [DEBUG] (1) wlink::firmware: Found loadable segment, physical address: 0x00002a9c, virtual address: 0x00002a9c, flags: 0x4
21:38:42 [DEBUG] (1) wlink::firmware: Matching section: ".rodata" offset: 0x3a9c size: 0xf00
21:38:42 [DEBUG] (1) wlink::firmware: Section names: [".rodata"]
21:38:42 [DEBUG] (1) wlink::firmware: Found loadable segment, physical address: 0x000039a0, virtual address: 0x20000000, flags: 0x6
21:38:42 [DEBUG] (1) wlink::firmware: Matching section: ".data" offset: 0x5000 size: 0x30
21:38:42 [DEBUG] (1) wlink::firmware: Section names: [".data"]
21:38:42 [DEBUG] (1) wlink::firmware: found 3 sections
21:38:42 [DEBUG] (1) wlink::firmware: Merge firmware sections with gap: 2
21:38:42 [DEBUG] (1) wlink::firmware: Merge firmware sections with gap: 4
21:38:42 [INFO] Flashing 14800 bytes to 0x08000000
21:38:42 [DEBUG] (1) wlink::operations: Reattach chip
21:38:42 [DEBUG] (1) wlink::operations: Reattach chip
21:38:42 [INFO] Read protected: false
21:38:42 [DEBUG] (1) wlink::operations: Using write pack size 1024 data pack size 64
21:38:42 [DEBUG] (1) wlink::operations: Flash OP written
14800/14800
21:38:44 [DEBUG] (1) wlink::operations: Fastprogram done
21:38:44 [INFO] Flash done
21:38:45 [INFO] Now reset...
21:38:45 [INFO] Now connect to the WCH-Link serial port to read SDI print
21:38:45 [DEBUG] (1) wlink::probe: Opening serial port: "/dev/ttyACM0"

NOTE I don't have riscv64-unknown-elf-size but Flashing 14800 bytes to 0x08000000 seems okay but perhaps it is exceeding flash?

2025-10-03 14:38:45.464: ) }
2025-10-03 14:38:45.464: H
2025-10-03 14:38:45.464: H
2025-10-03 14:38:45.679: L
2025-10-03 14:38:45.760: L
2025-10-03 14:38:45.925: H
2025-10-03 14:38:46.089: H
2025-10-03 14:38:46.172: L
2025-10-03 14:38:46.418: L
2025-10-03 14:38:46.419: H
2025-10-03 14:38:46.666: L
2025-10-03 14:38:46.747: H

NOTE The number of successes varies; reducing to a single LED and task is less (!) reliable iterating fewer times.

The first couple of lines of output are swallowed by wlink but using e.g. minicom, these are correctly output (pretty-printed):

CHIP signature => CH32V003F4P6
Clocks Clocks {
    sysclk: Hertz(48000000),
    hclk: Hertz(48000000),
    pclk1: Hertz(48000000),
    pclk2: Hertz(48000000),
    pclk1_tim: Hertz(48000000),
    pclk2_tim: Hertz(48000000)
}

Files:

.cargo/config.toml:

[build]
target = "riscv32ec-unknown-none-elf.json"

[target.'cfg(all(target_arch = "riscv32", target_os = "none"))']
runner = "wlink -v flash --enable-sdi-print --watch-serial"

[target."riscv32ec-unknown-none-elf"]
rustflags = [
    "-C", "link-arg=-Tlink.x",
]

[unstable]
build-std = ["core"]

Cargo.toml:

[package]
name = "ch32v003-blinky"
version = "0.0.1"
edition = "2024"

[[bin]]
name = "blinky"
path = "src/bin/blinky.rs"
test = false
bench = false

[[bin]]
name = "embassy"
path = "src/bin/embassy.rs"
test = false
bench = false

[dependencies]
ch32-hal = { git = "https://github.com/ch32-rs/ch32-hal", features = [
    "ch32v003f4u6",
    "embassy",
    "memory-x",
    "rt",
    "time-driver-tim2",
] }
embassy-executor = { version = "0.7.0", features = [
    "arch-spin",
    "executor-thread",
    "task-arena-size-256", # or better use nightly, but fails on recent Rust versions
] }
embassy-time = "0.4.0"
embedded-hal = "1.0.0"
panic-halt = "1.0.0"
qingke = "*"
qingke-rt = "0.5.0"

[profile.dev]
strip = false
lto = false
opt-level = "s"

[profile.release]
opt-level = "z"
lto = true
codegen-units = 1
strip = false
overflow-checks = false
panic = "abort"

embassy.rs:

#![no_std]
#![no_main]
#![feature(type_alias_impl_trait)]
#![feature(impl_trait_in_assoc_type)]

use ch32_hal as hal;
use embassy_executor::Spawner;
use embassy_time::Timer;
use hal::gpio::{AnyPin, Level, Output, Pin};
use hal::println;

#[embassy_executor::task(pool_size = 2)]
async fn blink(pin: AnyPin, interval_ms: u64) {
    let mut led = Output::new(pin, Level::Low, Default::default());

    loop {
        led.set_high();
        println!("H");
        Timer::after_millis(interval_ms).await;
        led.set_low();
        println!("L");
        Timer::after_millis(interval_ms).await;
    }
}

#[embassy_executor::main(entry = "qingke_rt::entry")]
async fn main(spawner: Spawner) -> ! {
    hal::debug::SDIPrint::enable();
    let mut config = hal::Config::default();
    config.rcc = hal::rcc::Config::SYSCLK_FREQ_48MHZ_HSI;
    let p = hal::init(config);

    println!("CHIP signature => {}", hal::signature::chip_id().name());
    println!("Clocks {:?}", hal::rcc::clocks());

    // PC4 is connected to LED2
    spawner.spawn(blink(p.PC4.degrade(), 250)).unwrap();
    // PD6 is connected to LED1
    spawner.spawn(blink(p.PD6.degrade(), 333)).unwrap();

    loop {
        Timer::after_millis(1000).await;
        // println!("tick");
    }
}

#[panic_handler]
fn panic(info: &core::panic::PanicInfo) -> ! {
    let _ = println!("\n\n\n{}", info);

    loop {}
}

DazWilkin avatar Oct 03 '25 21:10 DazWilkin

This reminds me this https://github.com/ch32-rs/ch32-hal/pull/95#issuecomment-2872380531

I have not personally tested main branch since the merge. You can try to point ch32-hal to a commit before this merge.

Edit: fix the branch name (Latin -> main), sorry

romainreignier avatar Oct 04 '25 07:10 romainreignier

The "Latin" branch is unclear to me (and a friendly LLM).

I tried referencing a few older commits but then encountered new issues with embassy-time references and gave up.

This is educational for me and I don't want to complicate your priorities.

I am willing to help debug if you are willing to provide me with specific guidance.

DazWilkin avatar Oct 06 '25 21:10 DazWilkin

I tried various alternatives (removing the embassy_executor::task, removing hal::println and println! references; removing all Output references) but the behavior was unchanged, until...

Removing embassy-time and Timer::after_millils results in a seemingly working solution.

Claude suggested using embassy-futures yield_now and, although I've lost the timing flexibility, this works.

So, for my combination at least, it appears that embassy-time use continues to be... problematic.

DazWilkin avatar Oct 06 '25 22:10 DazWilkin

I will have to check on a CH32V003 board myself.

romainreignier avatar Oct 14 '25 21:10 romainreignier

I have checked and I have the same behavior as you and the one I had in https://github.com/ch32-rs/ch32-hal/pull/95#issuecomment-2872380531. mcause : 0x80000026 and PC changes on each wlink regs call but only for lines around:

     fa8: 8004b5f3      csrrc   a1, 0x800, s1
     fac: 4602          lw      a2, 0x0(sp)
     fae: 4208          lw      a0, 0x0(a2)
     fb0: 89a1          andi    a1, a1, 0x8
     fb2: 00062023      sw      zero, 0x0(a2)
     fb6: c18d          beqz    a1, 0xfd8 <embassy_executor::arch::thread::Executor::run::h93bd8dc2b5b356e7+0xfe>
     fb8: 8004a073      csrs    0x800, s1
     fbc: a831          j       0xfd8 <embassy_executor::arch::thread::Executor::run::h93bd8dc2b5b356e7+0xfe>
     fbe: 8004b5f3      csrrc   a1, 0x800, s1

I still have no idea on how to debug further.

wlink regs
08:56:58 [INFO] Connected to WCH-Link v2.12(v32) (WCH-LinkE-CH32V305)
08:56:58 [INFO] Attached chip: CH32V003 [CH32V003F4U6] (ChipID: 0x00310500)
08:56:58 [INFO] Dump GPRs
dpc(pc):   0x00000fae
x0   zero: 0x00000000
x1     ra: 0x00000fd6
x2     sp: 0x200007c4
x3     gp: 0x20000b20
x4     tp: 0x80c87054
x5     t0: 0x00000008
x6     t1: 0x0001d404
x7     t2: 0x00000000
x8     s0: 0x00000000
x9     s1: 0x00000008
x10    a0: 0x00000000
x11    a1: 0x00000000
x12    a2: 0x200007f8
x13    a3: 0x00000000
x14    a4: 0x40000000
x15    a5: 0x20000478
marchid  : 0xdc68d841
mimpid   : 0xdc688001
mhartid  : 0x00000000
misa     : 0x40800014
mtvec    : 0x20000003
mscratch : 0x4a314b2c
mepc     : 0x00000fd8
mcause   : 0x80000026
mtval    : 0x00000000
mstatus  : 0x00001888
dcsr     : 0x400000c3
dpc      : 0x00000fae
dscratch0: 0x00000000
dscratch1: 0x00000000
gintenr  : 0x00000000
intsyscr : 0x00000003
corecfgr : 0x00000000

romainreignier avatar Oct 15 '25 09:10 romainreignier

total drive-by, but I notice that CSR 0x800, which is gintenr on V4, is not mentioned for the manual for V2. think this is something to do with no U mode on V2. so, the problem may be in the critical-section impl (fwiw I've always been skeptical of the use for U mode on all ch32 precisely because of this register. best to just ignore and use M mode like any other risc-v imo. actually I have wondered how hard it would be to just use riscv-rt but still get the PFIC stuff, I know they've been trying to make it more modular for ESP32 etc )

ExplodingWaffle avatar Oct 15 '25 16:10 ExplodingWaffle

Yes that is probably the issue, we probably need to either have all code run at M-mode, or have critical-section impl use MSTATUS directly

Codetector1374 avatar Oct 15 '25 16:10 Codetector1374

This is beyond my ken but I asked a friendly LLM:

#[qingke_rt::entry]
fn main() -> ! {
    println!("Begin test for gintenr (CSR 0x800)");
    
    let initial: usize;
    unsafe {
        core::arch::asm!(
            "csrr {0}, 0x800",
            out(reg) initial,
            options(nomem, nostack)
        );
    }
    println!("Initial value: {:#010x} (binary: {:08b})", initial, initial);
    
    // Write 1 (enable)
    unsafe {
        core::arch::asm!(
            "csrw 0x800, {0}",
            in(reg) 1usize,
            options(nomem, nostack)
        );
    }
    
    let after_one: usize;
    unsafe {
        core::arch::asm!(
            "csrr {0}, 0x800",
            out(reg) after_one,
            options(nomem, nostack)
        );
    }
    println!("After writing 1: {:#010x} (binary: {:08b})", after_one, after_one);
    
    // Write 0 (disable)
    unsafe {
        core::arch::asm!(
            "csrw 0x800, {0}",
            in(reg) 0usize,
            options(nomem, nostack)
        );
    }
    
    let after_zero: usize;
    unsafe {
        core::arch::asm!(
            "csrr {0}, 0x800",
            out(reg) after_zero,
            options(nomem, nostack)
        );
    }
    println!("After writing 0: {:#010x} (binary: {:08b})", after_zero, after_zero);
    
    println!("\nConclusion");
    if after_one != initial || after_zero != initial {
        println!("gintenr IS writable and functional!");
    } else {
        println!("gintenr exists but appears read-only or non-functional");
    }
    
    loop {}
}

And on the CH32V003F4P6 it outputs:

Begin test for gintenr (CSR **0x800)
Initial value: 0x00000000 (binary: 00000000)
After writing 1: 0x00000000 (binary: 00000000)
After writing 0: 0x00000000 (binary: 00000000)
Conclusion
gintenr exists but appears read-only or non-functional

DazWilkin avatar Oct 15 '25 16:10 DazWilkin

I have been running some tests today and I have noticed that using the arch-riscv32 feature of embassy-executor instead of arch-spin, I was able to get the blinky running much longer (like 30 min to 1h30). But at the end, I always end with this panic message:

panicked at /home/rre/embedded_rust/ch32/ch32-ha/src/embassy/time_driver_tim.rs:256:46:
RefCell already borrowed

The RefCell should be protected with a critical section, this matches @ExplodingWaffle remark I think.

I have also tried to use portable-atomics instead of core::sync::atomic but the results were not interesting, still hang.

romainreignier avatar Oct 15 '25 17:10 romainreignier

I believe I just ran into a similar issue. I've some experience with the v003 using C and the ch32 fun framework, and wanted to get my feet wet using embedded rust. I used embassy as having async would be great for one of the things I wanna do.

I used the template repository to create a project and it also uses a led blinking embassy example. I see exactly one 'tick' printout and then nothing. When pressing reset, again one tick and it hangs.

Do I interpret this issue correctly that embassy on the V003 is completely broken and cannot be used right now?

markusdd avatar Nov 30 '25 16:11 markusdd

Do I interpret this issue correctly that embassy on the V003 is completely broken and cannot be used right now?

Exactly, but you can use a commit hash prior to #95 merge and it was working great.

romainreignier avatar Nov 30 '25 17:11 romainreignier

thanks that is a good hint, will try this later! Is it planned to fix this so 003 get's unbricked again or will we have to stick with an older ch32-hal version?

markusdd avatar Nov 30 '25 17:11 markusdd

It would be nice if someone manage to find the issue and fix it. Personally, I have tried to look at it but have no idea what the issue is exactly :(

romainreignier avatar Nov 30 '25 17:11 romainreignier

The problem is that this is a massive PR. I quickly had a look at it but I do not feel at all confident in calling what the issue might be, as that would probably need a mcuh better overview in how the CH32 CPUs differ. It would be easier if it did not compile at all but runtime errors on CPUs like this really are not fun to debug.

Probably would either need to step through or insert strategic debug prints to find which exact operation goes wroing.

markusdd avatar Nov 30 '25 18:11 markusdd

Probably would either need to step through or insert strategic debug prints to find which exact operation goes wroing.

This is probably not going to work as expected. Given the nature of the issue is likely a race condition

I have also tried to use portable-atomics instead of core::sync::atomic but the results were not interesting, still hang.

core::sync::atomic is not really going to work right? QingKe V2A does not even have support for Atomic.

Anyways, looking at the test done by @DazWilkin, this more or less align with what I suspect: gintenr being not actually present on QingKe V2. Without that there is no proper critical section support thus the portable-atomic which is implemented with critical sections is just not going to work what so ever.

What's needed here is probably QingKe V2A needs a different implementation of critical section that directly interact with MSTATUS given it has no Umode there is no point to dance around the problem using gintenr

Codetector1374 avatar Dec 01 '25 16:12 Codetector1374

You are absolutely correct I would say. I cross checked and as far as I can see what is being tried here can simply not be done on the smaller/simpler cores.

markusdd avatar Dec 01 '25 22:12 markusdd

You are absolutely correct I would say. I cross checked and as far as I can see what is being tried here can simply not be done on the smaller/simpler cores.

You should just be able to enable the critical-section feature from the riscv crate instead of the qingke one to get a working critical section

ExplodingWaffle avatar Dec 02 '25 16:12 ExplodingWaffle

mmh this is a good idea.

still, shouldn't this then be changed in the embassy adaptor of this repo? Because as fdar as I can see PR #95 completely bricked the simpler WCH MCUs.

markusdd avatar Dec 02 '25 17:12 markusdd

to me, the issue seems to be with the critical section impl in the qingke crate. not certain enough to say anything about the time driver code before/after that PR but if it works with the good CS then I would imagine thats the whole issue. CS needs fixing to do anything serious anyway.

edit: I would test on v003 myself but I dont have one to hand

ExplodingWaffle avatar Dec 02 '25 17:12 ExplodingWaffle

looking at the PR again I tried to understand why before that it apparently worked although critical sections were used, but they also added an interrupt at the very top for Systick. So probably it was broken even before but nothing broke if INterrupts did not interfere?

But you have good point then. The PR breaking it has nothing to do with the PR, it's just a symptom. It's probably an issue that extends into the qingke crate.

markusdd avatar Dec 02 '25 17:12 markusdd

Yeah comparing the implementation that most certainly is the issue. I raised a ticket to track this: https://github.com/ch32-rs/qingke/issues/15

markusdd avatar Dec 02 '25 17:12 markusdd

Thank you everyone!

embassy_blinky is now working (>5 minutes and continuing) for me.

DazWilkin avatar Dec 09 '25 16:12 DazWilkin