depthai-core icon indicating copy to clipboard operation
depthai-core copied to clipboard

[BUG] Oak-D-POE fails to reconnect after disconnection

Open isherman opened this issue 2 years ago • 4 comments

Describe the bug When I repeatedly connect/disconnect to my OAK-D-PoE, after a handful of iterations (anywhere between 1-10), the connection fails with the error X_LINK_DEVICE_NOT_FOUND. This is true even when I explicitly specify the device's static IP address.

It seems to occur after fewer iterations when the delay between disconnection and reconnection is small (250ms), though it will also occur when the delay is 1s or greater.

This is with Library 2.15.0, OpenVINO 2021.4, Bootloader 0.0.17.

To Reproduce

  • Run the following with --ip <static_ip> --loop-delay 0.25.
  • Wait N iterations until a std::runtime_error is thrown with X_LINK_DEVICE_NOT_FOUND.
  • Apologies for the CLI dependency; I can remove if you'd like.
#include <CLI/CLI.hpp>
#include <depthai/depthai.hpp>

struct options {
  std::vector<std::string> ip_addresses;
  size_t num_frames = 60;
  std::optional<double> loop_delay;
};

std::vector<dai::DeviceInfo> get_device_info(const options& opts) {
  if (opts.ip_addresses.empty()) {
    return dai::Device::getAllAvailableDevices();
  }

  std::vector<dai::DeviceInfo> result;
  for (auto& ip_address : opts.ip_addresses) {
    auto device_info = dai::DeviceInfo();
    device_info.state = X_LINK_BOOTLOADER;
    device_info.desc.protocol = X_LINK_TCP_IP;
    strcpy(device_info.desc.name, ip_address.c_str());
    result.push_back(device_info);
  }
  return result;
}

void run_one(const options& opts) {
  dai::Pipeline pipeline;
  auto camRgb = pipeline.create<dai::node::ColorCamera>();
  auto xoutRgb = pipeline.create<dai::node::XLinkOut>();
  xoutRgb->setStreamName("rgb");
  camRgb->preview.link(xoutRgb->input);

  std::cout << "Initializing devices..." << std::endl;
  std::vector<std::unique_ptr<dai::Device>> devices;
  for (auto& device_info : get_device_info(opts)) {
    devices.push_back(std::make_unique<dai::Device>(pipeline, device_info));
  }

  for (size_t frame = 0; frame < opts.num_frames; ++frame) {
    for (auto& device : devices) {
      device->getOutputQueue("rgb")->get<dai::ImgFrame>();
      std::cout << ".";
    }
    std::cout << "|";
  }
  std::cout << std::endl;
}

int main(int argc, char** argv) {
  CLI::App app{argv[0]};
  options opts;
  app.add_option(
      "--ip",
      opts.ip_addresses,
      "Camera IP(s). If none provided, use getAllAvailableDevices.");
  app.add_option("--frames", opts.num_frames, "# of frames to query.");
  app.add_option("--loop-delay", opts.loop_delay, "Loop with delay (sec)");
  CLI11_PARSE(app, argc, argv);

  do {
    run_one(opts);

    using duration_t = std::chrono::duration<double>;
    std::this_thread::sleep_for(duration_t(opts.loop_delay.value_or(0)));
  } while (opts.loop_delay);

  return 0;
}

Expected behavior I'd expect to be able to repeatedly disconnect from and reconnect to the camera.

Attach system log log_system_information.json.log

Additional context output.log (with DEPTHAI_LEVEL=debug)

isherman avatar Mar 08 '22 20:03 isherman

On Discord, Luxonis-Erik tells me:

oak poes need about 5-10sec between reconnection, so watchdog has time to kick in and reset the device in Discovery mode

However I am able to reliably reproduce the failure using the above program with a --loop-delay 20 (seconds): after 25 iterations in one case, 50 in another.

isherman avatar Mar 08 '22 21:03 isherman

@isherman I repro'd your error above on 0.0.17, but couldn't on 0.0.15 - so that's something I guess.

Adding to the issue: On Windows, same SW config as above (Library 2.15.0, OpenVINO 2021.4, Bootloader 0.0.17), I can flash bootloader etc but cannot establish/maintain a connection most of the time, erroring out to X_LINK_DEVICE_NOT_FOUND.

So far, I've factory reset and reflashed per: https://docs.luxonis.com/projects/api/en/latest/tutorials/standalone_mode/#factory-reset, with no improvement. Static IP assignment via examples/bootloader/poe_set_ip.py works, with device ping-able on new network, but 8/10 times calls to dai.Device(pipeline) as device hang and yield X_LINK_DEVICE_NOT_FOUND. Same hardware can only run depthai_demo.py about 10% of the time now.

Same Windows machine on identical OAK-D POE device with Bootloader 0.0.15 maintains connection perfectly, and recovers from network drops fine. It runs depthai_demo.py fine every time.

Andrew-Dupuis avatar Mar 10 '22 22:03 Andrew-Dupuis

After a bit more testing: dropping to 2.14.1.0 and reflashing the borked device to the 0.0.15 network bootloader made it work reliably again. So some combination of POE+2.15.0+0.0.17 isn't playing nicely. The 0.0.15 devices both work well with 2.15.0.0 so long as they aren't updated.

Andrew-Dupuis avatar Mar 10 '22 22:03 Andrew-Dupuis

Thank you for the data points here!

Luxonis-Brandon avatar Mar 11 '22 00:03 Luxonis-Brandon