zenoh icon indicating copy to clipboard operation
zenoh copied to clipboard

Multiple Zenoh sessions can lead to not subscribing to topics with certain transports subscribing simultaneously

Open jmonticelli opened this issue 5 months ago • 1 comments

Describe the bug

Hi,

When running two Zenoh sessions on a distributed system, I was able to observe TLS connections being established, and declaring subscribers, and multicast transports joining but no subscriptions for the multicast transport being populated.

Symptoms we observed, explicitly:

  • From Client A's perspective, other clients' multicast/topic1 messages are being received
  • From the other clients' perspective, Client A's multicast/topic2 is not being received, but all other clients' multicast/topic1 publishes are being received
  • Wiresharked data
    • Pcaps show that Client A is sending out 2500ms periodic 30B packet "join" messages on udp:/229.12.34.56:12345
    • Pcaps show that Client A is sending out periodic group reports for udp:/229.12.34.56:12345
    • Pcaps show that the other clients are receiveing the periodic join multicast packets on udp:/229.12.34.56:12345 from Client A
  • GDB logs showed that we are not locked up in the application layer
  • Trace logs on the clients show that a Client A write is being attempted
  • When this problem is not happening, on Client A, declared subscribers from the first multicast join show subscribers for both multicast/topic1 and multicast/topic2
  • When this problem happens, on Client A, it immediately follows TCP session initalization, which declare TCP subscribers
    • The multicast transport is observed and reported but no subscriptions are declared

To reproduce

  1. Multiple clients in peer mode
  2. Declare MTLS session connect endpoint with multicast scouting, with gossip enabled (disabled gossip to no avail?)
  3. Declare multicast session listen endpoint with no multicast scouting, with gossip enabled (disabled gossip to no avail?)
  4. Publish MTLS data?
  5. Publish multicast data
  6. Make sure to print multicast subscriptions
  7. If error is not observed, restart clients by closing processes and restarting them, one at a time.
  8. Eventually, this should be observed - but it might require network delay and many clients to allow latency to permit lockup.

System info

  • Zenoh 1.2.1
$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

jmonticelli avatar Jul 17 '25 16:07 jmonticelli

Hello @jmonticelli, could you please share the exact Zenoh config you are using to setup the multicast endpoints and multicast scouting?

Furthermore, it's unclear to me what you mean by "Publish MTLS data" and "Publish multicast data". If you could make a repo with code that reproduces the issue it would be ideal.

oteffahi avatar Jul 31 '25 15:07 oteffahi