esp-hal
esp-hal copied to clipboard
HIL testing issues on different targets
Failing Tests
- [ ] ESP32-S2:
spi_full_duplex_dma
-
test_symmetric_dma_transfer_huge_buffer
If we add the following code:
The issue disappears, but not sure if that's the root cause of the issue because this is not required in other targets.--- a/hil-test/tests/spi_full_duplex_dma.rs +++ b/hil-test/tests/spi_full_duplex_dma.rs @@ -121,6 +121,8 @@ mod tests { let transfer = spi.dma_transfer(&mut send, &mut receive).unwrap(); transfer.wait().unwrap(); assert_eq!(send[0..1], receive[0..1]); + let transfer = spi.dma_transfer(&mut send, &mut receive).unwrap(); + transfer.wait().unwrap(); }
-
test_symmetric_dma_transfer
fails if previous test failed Im not sure what triggers this, but I think that when some test failed, the init state is bad and the first test ofspi_full_duplex_dma
fails.running 3 tests test tests::test_symmetric_dma_transfer ... ERROR panicked at 'assertion failed: `(left == right)`' diff < left / right > <[222, 173, 190, 239] >[0, 0, 0, 0] └─ spi_full_duplex_dma::tests::test_symmetric_dma_transfer @ tests/spi_full_duplex_dma.rs:81
-
- [ ] ESP32-S2:
delay
Intermittently fails thedelay_700millis
test. See https://github.com/esp-rs/esp-hal/actions/runs/8880612772/job/24381385302 - [ ] ESP32-C3/ESP32-C2:
uart::test_send_receive_different_baud_rates_and_clock_sources
When using theRcFast
clock source, the code just hangs in https://github.com/esp-rs/esp-hal/blob/main/esp-hal/src/uart.rs#L1108 and then, the test times out. Thereg_update
should:Software write 1 would synchronize registers into UART Core clock domain and would be cleared by hardware after synchronization is done.
but looks like its never cleared because it's not synchronized. - [x] ESP32-S3/ESP32-S2:
gpio::test_gpio_interrupt
(https://github.com/esp-rs/esp-hal/issues/1413) - [x] ESP32-H2:
get_time
(https://github.com/esp-rs/esp-hal/issues/1509) - [ ] ESP32/ESP32-S2
i2s
issue - [x] ESP32-H2
i2s
(https://github.com/esp-rs/esp-hal/issues/1637): https://github.com/esp-rs/esp-hal/pull/1755 - [ ] ESP32-C2 Delays are not working as expected, hence,
delay
andget_time
tests are failing.- Migth be similar to https://github.com/esp-rs/esp-hal/pull/1535
There are FIFO access issues on the S2 with uart, maybe there is something similar here?
There are FIFO access issues on the S2 with uart, maybe there is something similar here?
They could definitely be related, is there a tracking issue for that?
There have been a string of failures for the ESP32-S3, so to keep things moving I have removed the requirement for this check from the merge queue configuration. Once the issue has been resolved we will have to re-add this requirement.
There have been a string of failures for the ESP32-S3, so to keep things moving I have removed the requirement for this check from the merge queue configuration. Once the issue has been resolved we will have to re-add this requirement.
Did some testing locally and, looks like spi_full_duplex_dma
tests fails in S3 after https://github.com/esp-rs/esp-hal/pull/1859, the following changes makes the test pass in main
:
❯ git diff
error: cannot run less: No such file or directory
diff --git a/esp-hal/src/spi/master.rs b/esp-hal/src/spi/master.rs
index e6f407a6..a006fc63 100644
--- a/esp-hal/src/spi/master.rs
+++ b/esp-hal/src/spi/master.rs
@@ -2036,7 +2036,6 @@ where
self.update();
reset_dma_before_load_dma_dscr(reg_block);
- self.clear_dma_interrupts();
tx_chain.fill_for_tx(false, write_buffer_ptr, write_buffer_len)?;
tx.prepare_transfer_without_start(self.dma_peripheral(), tx_chain)
.and_then(|_| tx.start_transfer())?;
@@ -2044,6 +2043,7 @@ where
rx.prepare_transfer_without_start(self.dma_peripheral(), rx_chain)
.and_then(|_| rx.start_transfer())?;
+ self.clear_dma_interrupts();
reset_dma_before_usr_cmd(reg_block);
reg_block.cmd().modify(|_, w| w.usr().set_bit());
That's odd, when I run it on my S3 it works just fine.
What the output when you run spi_full_duplex_dma
?
Here is the full log:
Full log
Running tests/spi_full_duplex_dma.rs (target/xtensa-esp32s3-none-elf/release/deps/spi_full_duplex_dma-e539faf5a49e6c4d)
Erasing ✔ [00:00:00] [#########################################################################################################################################] 192.00 KiB/192.00 KiB @ 323.19 KiB/s (eta 0s )
Programming ✔ [00:00:00] [############################################################################################################################################] 45.08 KiB/45.08 KiB @ 59.17 KiB/s (eta 0s ) Finished in 1.368s
running 6 tests
test tests::test_symmetric_dma_transfer ... ok
test tests::test_asymmetric_dma_transfer ... ok
test tests::test_symmetric_dma_transfer_huge_buffer ... ERROR ====================== PANIC ======================
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR panicked at /home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/defmt-0.3.8/src/lib.rs:367:5:
explicit panic
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR Backtrace:
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x42005816
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x4200580a
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x4200332a
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x420025ca
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x42001aee
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x42001361
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x4200259f
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x42005c5e
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x420024fa
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
ERROR 0x420081a2
└─ esp_backtrace::panic_handler @ /home/sergio/Documents/Espressif/esp-rs/esp-hal/esp-backtrace/src/lib.rs:25
Frame 0: syscall_readonly @ 0x4200746d inline
/home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/sys/arm_compat/syscall/xtensa.rs:29:9
Frame 1: sys_exit_extended @ 0x000000004200746d inline
/home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/sys/arm_compat/mod.rs:196:9
Frame 2: exit @ 0x0000000042007461 inline
/home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/sys/arm_compat/mod.rs:165:5
Frame 3: exit @ 0x0000000042007461
/home/sergio/.cargo/registry/src/index.crates.io-6f17d22bba15001f/semihosting-0.1.13/src/process.rs:14:5
Frame 4: <unknown function @ 0x82007490> @ 0x82007490
FAILED
test tests::test_try_using_non_dma_memory_tx_buffer ... ok
test tests::test_try_using_non_dma_memory_rx_buffer ... ok
test tests::test_symmetric_dma_transfer_owned ... ok
failures:
---- tests::test_symmetric_dma_transfer_huge_buffer ----
Test should Pass but it did Panic
failures:
tests::test_symmetric_dma_transfer_huge_buffer
test result: FAILED. 5 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 2.53s
Error: Some tests failed
error: test failed, to rerun pass `--test spi_full_duplex_dma`
Caused by:
process didn't exit successfully: `probe-rs run /home/sergio/Documents/Espressif/esp-rs/esp-hal/hil-test/target/xtensa-esp32s3-none-elf/release/deps/spi_full_duplex_dma-e539faf5a49e6c4d --chip esp32s3 -Zbuild-std=core,alloc` (exit status: 1)
The error occurs both when running the whole test suite and only the spi_full_duplex_dma
test. I'm running espflash erase-flash
before running the tests and I'm using a ESP32-S3-DevKitC-1
Ah the stack doesn't show where the panic happened. I may have some time this evening to look further.
It's rather odd that this only fails on the S3 but only sometimes, and undoing it breaks async spi on the Esp32
I was able to reproduce the failing HIL test and have done a bit of digging. (sorry for the wall of text, I wasn't planning on writing this much :grimacing: )
The quick fix for the HIL tests is to do PeripheralClockControl::reset(Peripheral::Gdma);
right after the PeripheralClockControl::enable(Peripheral::Gdma);
in the Dma::new
implementation.
(This should probably be done in all drivers in general)
This was hinting that there was some kind of lingering broken state in the SPI/DMA driver.
Similar to what you found in this PR @SergioGasquez, it appears that even erasing the flash isn't enough to reset this broken state.
Without the PeripheralClockControl::reset(Peripheral::Gdma);
, I needed to flash the ESP32-S3 with unrelated code, power cycle (unplug and replug USB) and then rerun the tests.
I don't know enough about ESP32 resets/hardware to make a suggestion yet but whatever reset procedure the HIL tests are using isn't cutting it. I'm able to observe lingering interrupt bits from previous runs.
(This may be the cause for some of the issues in the first comment)
I would've made a PR for the quick fix but it's not quite appropriate. Even if we make the HIL tests pass by resetting the GDMA peripheral, the reality is users don't have this luxury. Once they enter the broken state the spi driver stops working as expected.
I've narrowed down the broken state to be coming from test_asymmetric_dma_transfer
. Removing that test makes everything run fine (even without PeripheralClockControl::reset(Peripheral::Gdma);
).
My current hypothesis for why it leaves the DMA in a broken state is based on the fact that the SPI peripheral doesn't support asymmetric transfers.
The test tries to send 4 bytes and receives 2 bytes. The driver instructs the hardware to transfer 4 bytes. Unfortunately the SPI reads 4 bytes (and emits an EOF) and RX half of the DMA channel is only expecting 2 bytes, which means it's unable to receive the EOF. I know this because the SPI_DMA_INFIFO_FULL_ERR_INT_RAW
bit is high.
I'm not certain why this puts the DMA channel in this weird state but I reckon it has something to do with the asymmetric transfer.
My current thinking for a proper solution is to remove support for asymmetric transfers in the base driver. The hardware doesn't support it directly. An asymmetric transfer should be implemented as two transfers, one full duplex symmetric and one half duplex. i.e. Transfer + Write or Transfer + Read. (This also makes error handling a bit better since asymmetric transfers trigger the error interrupts)
Thoughts?
One thing I discovered while looking into another (unrelated) future HIL test PR: apparently JTAG reset (which is done by the HIL tests) doesn't reset peripherals on all chips (it does on e.g. ESP32-C6/ESP32-C3 but doesn't on ESP32-S3 - probably it's different for Xtensa and RISC-V based chips).
The quick fix for the HIL tests is to do PeripheralClockControl::reset(Peripheral::Gdma); right after the PeripheralClockControl::enable(Peripheral::Gdma); in the Dma::new implementation. (This should probably be done in all drivers in general)
I added the reset also for I2C which tends to get into weird states on errors. I agree we probably should do it for all drivers - I'll open an issue for that
probe-rs doesn't yet do a proper system reset for Xtensa targets, that can explain why peripherals aren't reset. It's coming, but I need someone to merge changes over there to progress :)
Just updated the initial list, only two issues left!
Don't worry @SergioGasquez we'll have more once ESP32 is on line 🙃
ESP32-C3/ESP32-C2: uart::test_send_receive_different_baud_rates_and_clock_sources When using the RcFast clock source, the code just hangs in https://github.com/esp-rs/esp-hal/blob/main/esp-hal/src/uart.rs#L1108 and then, the test times out. The reg_update should: Software write 1 would synchronize registers into UART Core clock domain and would be cleared by hardware after synchronization is done. but looks like its never cleared because it's not synchronized.
The issue is, RcFast is not running. Fixed in #2170
ESP32-S2: https://github.com/esp-rs/esp-hal/issues/2098
The main issue is PDMA writing more data than expected. If the buffer's size is not 1 mod 4, PDMA will transfer a whole word, regardless of how many bytes we asked for.
"Fixed" by #2179, by ensuring that DMA buffers are always a) aligned and b) a multiple of 4 bytes, regardless of what the user specified.
ESP32: spi_full_duplex_dma_async and spi_full_duplex_dma_pcnt
I'm not sure what's wrong here, the tests are enabled and passing on ESP32
ESP32: spi_full_duplex_dma_async and spi_full_duplex_dma_pcnt
I'm not sure what's wrong here, the tests are enabled and passing on ESP32
They were fix with https://github.com/esp-rs/esp-hal/pull/2131, I will tick them.
I2S on ESP32-S2 just needed enabling, done in #2181.
I2S on ESP32 generates two wrong clock pulses. This made the receiver reject the first sample pair, causing the test to fail. Issue worked around in #2194 by assigning the first halfword to the right channel and by outputting the right channel first.
Receivers may still get freaked out by the 2 extra clock pulses they encounter before the first sample, but I'm not sure what to do about that.
If a HIL test fails, would be sensible to automatically rerun it with defmt
enabled? The obscure stacks are a little unhelpful.
Yeah that's something we can try. But if a test is spotty, the second run might just succeed. Same if the failure relies on some timing circumstances.
In case of I2S I'm suspecting dodgy wiring or some default pull resistor getting in the way.
Yeah that's something we can try. But if a test is spotty, the second run might just succeed. Same if the failure relies on some timing circumstances.
Ah fair point. The 2nd run's output may actually be misleading too.
In case of I2S I'm suspecting dodgy wiring or some default pull resistor getting in the way.
I was going to suggest reducing the I2S frequency in case it's that but 16kHz is already quite low.
The obscure stacks are a little unhelpful.
Getting useful backtraces would be good
Getting useful backtraces would be good
I would like to solve this in probe-rs if possible, instead of hacking around with reruns and relying on defmt that introduces timing jitter. Even if that timing jitter is actually a good way to inject errors and we might want to be resilient there :D
Stack traces will be fixed with https://github.com/probe-rs/probe-rs/pull/2831 and https://github.com/probe-rs/probe-rs/pull/2829 so I'll try an update once they are in.
To avoid this being an infinite issue, I've created https://github.com/esp-rs/esp-hal/issues/2312, https://github.com/esp-rs/esp-hal/issues/2313, https://github.com/esp-rs/esp-hal/issues/2289, and https://github.com/esp-rs/esp-hal/issues/2314. Which were the remaining items on the list, and I'll close this issue.