ethercrab icon indicating copy to clipboard operation
ethercrab copied to clipboard

Stuck in PRE-OP: BK1120 EtherCAT K-bus coupler PRE-OP -> OP: NotFound { item: SyncManager, index: Some(4) }

Open andergisomon opened this issue 8 months ago • 4 comments

Environment

  • Version of ethercrab in use or git hash: 0.6.0
  • Operating system kind and version: Pop!_OS 22.04 LTS x86_64
  • EtherCAT devices in use: BK1120 chained after an EK1100

Uploads

debug_output.txt

bk1120_capture_0.6.0_2025-04-09T08:23+08:00.zip

Description

BK1120 is stuck in pre-op, with program output:

thread 'main' panicked at src/main.rs:114:10:
PRE-OP -> SAFE-OP: NotFound { item: SyncManager, index: Some(4) }

Just before exiting, thrown by SubDeviceGroup::into_op

Code used to produce the bug

Just the EK1100 example with maindeviceconfig timeouts modified

//! Demonstrate setting outputs using a Beckhoff EK1100/EK1501 and modules.
//!
//! Run with e.g.
//!
//! Linux
//!
//! ```bash
//! RUST_LOG=debug cargo run --example ek1100 --release -- eth0
//! ```
//!
//! Windows
//!
//! ```ps
//! $env:RUST_LOG="debug" ; cargo run --example ek1100 --release -- '\Device\NPF_{FF0ACEE6-E8CD-48D5-A399-619CD2340465}'
//! ```

use env_logger::Env;
use ethercrab::{
    MainDevice, MainDeviceConfig, PduStorage, Timeouts, error::Error, std::ethercat_now, RetryBehaviour
};
use std::{
    sync::{
        Arc,
        atomic::{AtomicBool, Ordering},
    },
    time::Duration,
};
use tokio::time::MissedTickBehavior;

/// Maximum number of SubDevices that can be stored. This must be a power of 2 greater than 1.
const MAX_SUBDEVICES: usize = 16;
/// Maximum PDU data payload size - set this to the max PDI size or higher.
const MAX_PDU_DATA: usize = PduStorage::element_size(1100);
/// Maximum number of EtherCAT frames that can be in flight at any one time.
const MAX_FRAMES: usize = 16;
/// Maximum total PDI length.
const PDI_LEN: usize = 64;

static PDU_STORAGE: PduStorage<MAX_FRAMES, MAX_PDU_DATA> = PduStorage::new();

#[tokio::main]
async fn main() -> Result<(), Error> {
    env_logger::Builder::from_env(Env::default().default_filter_or("info")).init();

    let interface = std::env::args()
        .nth(1)
        .expect("Provide network interface as first argument.");

    log::info!("Starting EK1100/EK1501 demo...");
    log::info!(
        "Ensure an EK1100 or EK1501 is the first SubDevice, with any number of modules connected after"
    );
    log::info!("Run with RUST_LOG=ethercrab=debug or =trace for debug information");

    let (tx, rx, pdu_loop) = PDU_STORAGE.try_split().expect("can only split once");

    let maindevice = Arc::new(MainDevice::new(
        pdu_loop,
        Timeouts { // BK is a bit sluggish
            state_transition: Duration::from_millis(10000), // 5000
            pdu: Duration::from_micros(30_000), // 30_000
            eeprom: Duration::from_millis(50), // 10
            wait_loop_delay: Duration::from_millis(2), // 2
            mailbox_echo: Duration::from_millis(1000), // 100
            mailbox_response: Duration::from_millis(15000), // 1000
        },
       MainDeviceConfig {dc_static_sync_iterations: 10_000, retry_behaviour: RetryBehaviour::Forever}
    ));

    #[cfg(target_os = "windows")]
    std::thread::spawn(move || {
        ethercrab::std::tx_rx_task_blocking(
            &interface,
            tx,
            rx,
            ethercrab::std::TxRxTaskConfig { spinloop: false },
        )
        .expect("TX/RX task")
    });
    #[cfg(not(target_os = "windows"))]
    tokio::spawn(ethercrab::std::tx_rx_task(&interface, tx, rx).expect("spawn TX/RX task"));

    let group = maindevice
        .init_single_group::<MAX_SUBDEVICES, PDI_LEN>(ethercat_now)
        .await
        .expect("Init");

    log::info!("Discovered {} SubDevices", group.len());

    for subdevice in group.iter(&maindevice) {
        if subdevice.name() == "EL3004" {
            log::info!("Found EL3004. Configuring...");

            subdevice.sdo_write(0x1c12, 0, 0u8).await?;

            subdevice
                .sdo_write_array(0x1c13, &[0x1a00u16, 0x1a02, 0x1a04, 0x1a06])
                .await?;

            // The `sdo_write_array` call above is equivalent to the following
            // subdevice.sdo_write(0x1c13, 0, 0u8).await?;
            // subdevice.sdo_write(0x1c13, 1, 0x1a00u16).await?;
            // subdevice.sdo_write(0x1c13, 2, 0x1a02u16).await?;
            // subdevice.sdo_write(0x1c13, 3, 0x1a04u16).await?;
            // subdevice.sdo_write(0x1c13, 4, 0x1a06u16).await?;
            // subdevice.sdo_write(0x1c13, 0, 4u8).await?;
        }
    }

    let group = group.into_op(&maindevice).await.expect("PRE-OP -> OP");

    for subdevice in group.iter(&maindevice) {
        let io = subdevice.io_raw();

        log::info!(
            "-> SubDevice {:#06x} {} inputs: {} bytes, outputs: {} bytes",
            subdevice.configured_address(),
            subdevice.name(),
            io.inputs().len(),
            io.outputs().len()
        );
    }

    let mut tick_interval = tokio::time::interval(Duration::from_millis(5));
    tick_interval.set_missed_tick_behavior(MissedTickBehavior::Skip);

    let shutdown = Arc::new(AtomicBool::new(false));
    signal_hook::flag::register(signal_hook::consts::SIGINT, Arc::clone(&shutdown))
        .expect("Register hook");

    loop {
        // Graceful shutdown on Ctrl + C
        if shutdown.load(Ordering::Relaxed) {
            log::info!("Shutting down...");

            break;
        }

        group.tx_rx(&maindevice).await.expect("TX/RX");

        // Increment every output byte for every SubDevice by one
        for subdevice in group.iter(&maindevice) {
            let mut o = subdevice.outputs_raw_mut();

            for byte in o.iter_mut() {
                *byte = byte.wrapping_add(1);
            }
        }

        tick_interval.tick().await;
    }

    let group = group
        .into_safe_op(&maindevice)
        .await
        .expect("OP -> SAFE-OP");

    log::info!("OP -> SAFE-OP");

    let group = group
        .into_pre_op(&maindevice)
        .await
        .expect("SAFE-OP -> PRE-OP");

    log::info!("SAFE-OP -> PRE-OP");

    let _group = group.into_init(&maindevice).await.expect("PRE-OP -> INIT");

    log::info!("PRE-OP -> INIT, shutdown complete");

    Ok(())
}

andergisomon avatar Apr 09 '25 00:04 andergisomon

The FMMU and sync managers discovered by ethercrab matched the ones in TwinCAT (Advanced Settings > General > FMMU / SM).

The BK1120 coupler did have other K terminals connected to it: KL2889, KL1889, KL6581). But I ran it without any terminals, and with just the KL2889 and KL1889 and the same problem arose. So it's unlikely that the issue is coming from the terminals rather just the coupler itself.

I'll try to compare the wireshark captures between TwinCAT and ethercrab and see what's missing.

andergisomon avatar Apr 09 '25 09:04 andergisomon

Anyways, seems to originate from here: https://github.com/ethercrab-rs/ethercrab/blob/a7285c84fbb33049e26b7a80a1ea4e0e739d5e08/src/subdevice/eeprom.rs#L257

Ultimately from here, where sync_manager_index causes sync_managers to be accessed out of bounds: https://github.com/ethercrab-rs/ethercrab/blob/a7285c84fbb33049e26b7a80a1ea4e0e739d5e08/src/subdevice/configuration.rs#L338-L346

The thing is, debug shows that the SMs for 0x1004 (the BK1120) was discovered by ethercrab just fine.

There's probably an indexing error somewhere because the error is stemming from sync_manager_index. There are 4 SMs, but defined 0 .. 3.

Edit: Yep, thats exactly what happened

Image

andergisomon avatar Apr 10 '25 00:04 andergisomon

Edit: Lol no, seems like it really is coming from the .iter().enumerate(). Honestly I can't see how it's even possible for ... mailbox.coe_sync_manager_types to return an iterator bigger than 4

Image

The length of the sms Vec in SubDeviceRef::configure_mailboxes is also 4. I really hope this isn't an std bug💀. Either way, some hidden control flow is mysteriously appending coe_sync_manager_types.

andergisomon avatar Apr 11 '25 06:04 andergisomon

Seems to have originated here:

https://github.com/ethercrab-rs/ethercrab/blob/e287329bd4adb07de4bf0f162728107cd703a5a8/src/subdevice/configuration.rs#L54-L60

The ESI file in TwinCAT does show the BK1120 supports complete access. The sdo_read with complete access here returns 6 SMs.

andergisomon avatar Jun 27 '25 06:06 andergisomon

Thanks for investigating! I wish I had the time to do it myself :D

Anyway, could you try #314 and see if it works? E.g. this in your Cargo.toml:

ethercrab = { git = "https://github.com/ethercrab-rs/ethercrab.git", rev = "3057508d723fa3a5c46ed9925293e578328cd63b" }

jamwaffles avatar Jun 27 '25 15:06 jamwaffles

Will do first thing Monday :) Soak testing my Pi over the weekend.

andergisomon avatar Jun 27 '25 15:06 andergisomon

Yep it works :D

andergisomon avatar Jun 30 '25 00:06 andergisomon

Great! Thanks for testing!

jamwaffles avatar Jun 30 '25 08:06 jamwaffles