[BUG] Confusing cluster nam

Open ghost opened this issue 2 weeks ago • 0 comments

Is there an existing issue for this?

[x] I have searched the existing issues

Current Behavior

After digging deeper into the inconsistent node name display behavior across our defiant and enterprise clusters, I’ve discovered a pattern that seems to implicate the gorilla/websocket layer Pulse uses for its live UI updates.

Specifically, node names appear correct immediately after loading a page (e.g., enterprise2 (enterprise2)), but as soon as the WebSocket connection initializes, the names are silently replaced with the truncated IPv6 variants (2001:67c:ae8:3:), IPv4-mapped fragments (::ffff:10.), or in one case just an empty string followed by (enterprise2).

This suggests that the initial REST response contains the correct display names, but the incremental updates pushed over gorilla/websocket are using the wrong field—possibly the raw remoteAddr from the connection handshake, or some internal peer address Pulse shouldn’t be exposing at all.

Even stranger: on the limits configuration screen, two nodes momentarily show a full reverse-lookup PTR string (e.g., b.3.0.…ip6.arpa) right when the WebSocket reconnects, before being overwritten by yet another truncated form. That flash happens exactly when gorilla/websocket negotiates a new frame, so it seems tied to the update pipeline rather than DNS itself.

Because the WebSocket stream also drives sorting updates, the UI reorders nodes differently after the WebSocket connection is established. So the list is sorted one way on initial render and another way once the live updates kick in, depending on which mangled form the WebSocket payload happens to use.

At this point it seems likely that Pulse is pulling an address from the wrong gorilla/websocket metadata field and treating it as the canonical node identifier.

Expected Behavior

WebSocket updates should never introduce alternate or derived identifiers. The node name pushed over gorilla/websocket must match the internal cluster node name used elsewhere and must not depend on peer connection metadata, IPs, DNS, or the WebSocket handshake.

Steps To Reproduce

Configure two clusters with dual-stack IPv4/IPv6.
Add nodes during a period of DNS instability.
Load Pulse and observe correct names on initial render.
Wait for the gorilla/websocket connection to initialize.
Observe names silently transforming as live updates arrive.
Switch tabs (overview → limits → detail) to see different mangled formats.

Anything else?

Given gorilla/websocket’s behavior here, it’s possible Pulse is accidentally serializing conn.RemoteAddr() or similar into its update messages. Even if this turns out not to be the root cause, the symptoms only appear after the WebSocket layer becomes active, so the interaction is definitely worth investigating.