jamulus icon indicating copy to clipboard operation
jamulus copied to clipboard

Provides SRV support for -e|--directoryaddress option

Open rdica opened this issue 1 month ago • 8 comments

Short description of changes

Provides SRV DNS support for -e|--directoryaddress option.

CHANGELOG: SKIP

Context

Currently one needs to provide both an IP/host and port number to the -e|--directoryaddress option if the directory server is not using the default port 22124. This patch will enable the ability to use preconfigured SRV DNS records by a server to connect to a directory without having to provide a port number.

The patch expands on SRV support in client code already in main.

I have created SRV DNS records to test with that point to each of the seven public directory servers provided by the Jamulus team:

  • anygenre1.jamulusjams.com
  • anygenre2.jamulusjams.com
  • anygenre3.jamulusjams.com
  • rock.jamulusjams.com
  • jazz.jamulusjams.com
  • classical.jamulusjams.com
  • choral.jamulusjams.com

You can confirm the SRV records using the following: Mac/Linux

dig _jamulus._udp.anygenre1.jamulusjams.com srv

;; ANSWER SECTION:
_jamulus._udp.anygenre1.jamulusjams.com. 3600 IN SRV 0 0 22124 anygenre1.jamulus.io.

Windows

nslookup -type=srv _jamulus._udp.anygenre1.jamulusjams.com`

Server:  UnKnown
Address:  10.2.0.1

_jamulus._udp.anygenre1.jamulusjams.com SRV service location:
          priority       = 0
          weight         = 0
          port           = 22124
          svr hostname   = anygenre1.jamulus.io

In order to utilize this functionality for the Jamulus public space, the Jamulus team could create the seven SRV records and publish those in the same table that displays the server host/port pairs in https://jamulus.io/wiki/Running-a-Server#registered-mode

Does this change need documentation? What needs to be documented and how?

Unsure. According to --help output, the option -c|--connect doesn't mention anything about SRV support. In that same vein I submit nothing should be added for -e|--directoryaddress either.

Status of this Pull Request

What is missing until this pull request can be merged?

Checklist

  • [x] I've verified that this Pull Request follows the general code principles
  • [x] I tested my code and it does what I want
  • [x] My code follows the style guide
  • [x] I waited some time after this Pull Request was opened and all GitHub checks completed without errors.
  • [x] I've filled all the content above

rdica avatar Nov 06 '25 00:11 rdica

CC @gilgongo and @softins for DNS

ann0see avatar Nov 09 '25 10:11 ann0see

CC @gilgongo and @softins for DNS

I have just created SRV records in our zone on Cloudflare for the various directories, as follows:

;; SRV Records
_jamulus._udp.anygenre1.jamulus.io.     60      IN      SRV     0 0 22124 anygenre1.jamulus.io.
_jamulus._udp.anygenre2.jamulus.io.     60      IN      SRV     0 0 22224 anygenre2.jamulus.io.
_jamulus._udp.anygenre3.jamulus.io.     60      IN      SRV     0 0 22624 anygenre3.jamulus.io.
_jamulus._udp.choral.jamulus.io.        60      IN      SRV     0 0 22724 choral.jamulus.io.
_jamulus._udp.classical.jamulus.io.     60      IN      SRV     0 0 22524 classical.jamulus.io.
_jamulus._udp.jazz.jamulus.io.          60      IN      SRV     0 0 22324 jazz.jamulus.io.
_jamulus._udp.private.jamulus.io.       60      IN      SRV     0 0 22124 private.jamulus.io.
_jamulus._udp.rock.jamulus.io.          60      IN      SRV     0 0 22424 rock.jamulus.io.

Once we are happy they are working correctly, we can wind the TTLs up from 1 minute to something longer.

softins avatar Nov 10 '25 16:11 softins

@softins thanks, I have reconfigured one of my servers to use SRV for anygenre2.jamulus.io, monitoring.

Nov 10 18:21:03 daw6 jamulus[279573]: resolved anygenre2.jamulus.io to a single SRV record: anygenre2.jamulus.io:22224
Nov 10 18:21:03 daw6 jamulus[279573]: Server Registration Status update: Registration requested
Nov 10 18:21:03 daw6 jamulus[279573]: Server Registration Status update: Registered

rdica avatar Nov 10 '25 18:11 rdica

While the SRV lookups work and servers can connect to a directory server, I'm seeing all my test servers lose their ability to maintain their registrations eventually, some indeterminate time within 24 hrs, and attempts to re-register stop being logged. Any clients that were connected to the server itself also get disconnected. Attempting to investigate further...

rdica avatar Nov 14 '25 16:11 rdica

I still haven't been able to determine why servers that use the SRV record to register with a directory eventually lose their connections to one, but I also found that when using the new RPC method to enable/disable/change directory server to Any Genre1 the SRV record is used, thus the server eventually loses registration and is no longer connected to the directory. Other genres aren't affected apparently due to their address and port pairs being explicitly defined as single objects in global.h so SRV lookups aren't performed on those hostnames.

rdica avatar Dec 04 '25 19:12 rdica

This is an interesting one. I would be interested to try to understand and diagnose it, as my available time permits. We need to do so before we can confidently add this feature.

softins avatar Dec 08 '25 17:12 softins

@rdica to make the problem potentially happen more quickly, you could change SERVLIST_REGIST_INTERV_MINUTES from 15 to 1 minute in global.h. If you find clients get disconnected at the same time, it could be that the Jamulus server is getting locked up. It would be worth checking its memory consumption over time, and particularly when it has stopped working.

I am building a server with this change, and some extra debug output, to see.

I am rather suspicious of this code: https://github.com/jamulussoftware/jamulus/blob/cb5a880ecf0de5e0ac052e133c341a1237c3be0b/src/util.cpp#L760-L774

I think deleteLater() is relying on an event loop to perform the deletion. When our ParseNetworkAddressSrv() resolver is called from the connect dialog, that may well happen ok, although since this function will only be called once when connecting, it will be light on resources anyway. However, when called by a headless server, it seems plausible that the deleteLater queue might not get serviced in the same way. And that resolver function gets called repeatedly, once for each registration refresh.

However, at the moment, that is only an educated guess.

softins avatar Dec 08 '25 22:12 softins

I think it's something different from what I thought above. I ran two servers - one headless and one GUI - on the same machine, both registered to private.jamulus.io with a refresh interval of 1 minute. They both lasted about an hour or so before they stopped registering. I then found that they were both completely unresponsive, probably deadlocked in some way. The GUI server would not respond to any action, and neither server would respond to control-C to terminate, nor a plain kill from the command line. I needed to use kill -9. More investigation needed.

It may be that we just need to try a different implementation of the DNS lookup for SRV. I'm investigating that possibility too.

softins avatar Dec 09 '25 10:12 softins

I modified global.h as suggested and experienced the same thing. I'm at a loss as to further troubleshooting.

@softins Seeing as this isn't working as expected, shall I close this MR?

rdica avatar Dec 22 '25 19:12 rdica

I modified global.h as suggested and experienced the same thing. I'm at a loss as to further troubleshooting.

@softins Seeing as this isn't working as expected, shall I close this MR?

I think you should leave it open, but set it to draft. There is nothing wrong with your change; the problem is deeper.

I have been looking at this recently, and found there is something in Qt's QDnsLookup that locks up on exactly the 60th invocation. I haven't yet found why, but have had detailed conversations with ChatGPT about possible changes in approach.

I'm still looking at it as time permits

softins avatar Dec 22 '25 19:12 softins

Fair enough. I have converted to draft, thanks.

rdica avatar Dec 22 '25 20:12 rdica