Provides SRV support for -e|--directoryaddress option
Short description of changes
Provides SRV DNS support for -e|--directoryaddress option.
CHANGELOG: SKIP
Context
Currently one needs to provide both an IP/host and port number to the -e|--directoryaddress option if the directory server is not using the default port 22124. This patch will enable the ability to use preconfigured SRV DNS records by a server to connect to a directory without having to provide a port number.
The patch expands on SRV support in client code already in main.
I have created SRV DNS records to test with that point to each of the seven public directory servers provided by the Jamulus team:
- anygenre1.jamulusjams.com
- anygenre2.jamulusjams.com
- anygenre3.jamulusjams.com
- rock.jamulusjams.com
- jazz.jamulusjams.com
- classical.jamulusjams.com
- choral.jamulusjams.com
You can confirm the SRV records using the following: Mac/Linux
dig _jamulus._udp.anygenre1.jamulusjams.com srv
;; ANSWER SECTION:
_jamulus._udp.anygenre1.jamulusjams.com. 3600 IN SRV 0 0 22124 anygenre1.jamulus.io.
Windows
nslookup -type=srv _jamulus._udp.anygenre1.jamulusjams.com`
Server: UnKnown
Address: 10.2.0.1
_jamulus._udp.anygenre1.jamulusjams.com SRV service location:
priority = 0
weight = 0
port = 22124
svr hostname = anygenre1.jamulus.io
In order to utilize this functionality for the Jamulus public space, the Jamulus team could create the seven SRV records and publish those in the same table that displays the server host/port pairs in https://jamulus.io/wiki/Running-a-Server#registered-mode
Does this change need documentation? What needs to be documented and how?
Unsure. According to --help output, the option -c|--connect doesn't mention anything about SRV support. In that same vein I submit nothing should be added for -e|--directoryaddress either.
Status of this Pull Request
What is missing until this pull request can be merged?
Checklist
- [x] I've verified that this Pull Request follows the general code principles
- [x] I tested my code and it does what I want
- [x] My code follows the style guide
- [x] I waited some time after this Pull Request was opened and all GitHub checks completed without errors.
- [x] I've filled all the content above
CC @gilgongo and @softins for DNS
CC @gilgongo and @softins for DNS
I have just created SRV records in our zone on Cloudflare for the various directories, as follows:
;; SRV Records
_jamulus._udp.anygenre1.jamulus.io. 60 IN SRV 0 0 22124 anygenre1.jamulus.io.
_jamulus._udp.anygenre2.jamulus.io. 60 IN SRV 0 0 22224 anygenre2.jamulus.io.
_jamulus._udp.anygenre3.jamulus.io. 60 IN SRV 0 0 22624 anygenre3.jamulus.io.
_jamulus._udp.choral.jamulus.io. 60 IN SRV 0 0 22724 choral.jamulus.io.
_jamulus._udp.classical.jamulus.io. 60 IN SRV 0 0 22524 classical.jamulus.io.
_jamulus._udp.jazz.jamulus.io. 60 IN SRV 0 0 22324 jazz.jamulus.io.
_jamulus._udp.private.jamulus.io. 60 IN SRV 0 0 22124 private.jamulus.io.
_jamulus._udp.rock.jamulus.io. 60 IN SRV 0 0 22424 rock.jamulus.io.
Once we are happy they are working correctly, we can wind the TTLs up from 1 minute to something longer.
@softins thanks, I have reconfigured one of my servers to use SRV for anygenre2.jamulus.io, monitoring.
Nov 10 18:21:03 daw6 jamulus[279573]: resolved anygenre2.jamulus.io to a single SRV record: anygenre2.jamulus.io:22224
Nov 10 18:21:03 daw6 jamulus[279573]: Server Registration Status update: Registration requested
Nov 10 18:21:03 daw6 jamulus[279573]: Server Registration Status update: Registered
While the SRV lookups work and servers can connect to a directory server, I'm seeing all my test servers lose their ability to maintain their registrations eventually, some indeterminate time within 24 hrs, and attempts to re-register stop being logged. Any clients that were connected to the server itself also get disconnected. Attempting to investigate further...
I still haven't been able to determine why servers that use the SRV record to register with a directory eventually lose their connections to one, but I also found that when using the new RPC method to enable/disable/change directory server to Any Genre1 the SRV record is used, thus the server eventually loses registration and is no longer connected to the directory. Other genres aren't affected apparently due to their address and port pairs being explicitly defined as single objects in global.h so SRV lookups aren't performed on those hostnames.
This is an interesting one. I would be interested to try to understand and diagnose it, as my available time permits. We need to do so before we can confidently add this feature.
@rdica to make the problem potentially happen more quickly, you could change SERVLIST_REGIST_INTERV_MINUTES from 15 to 1 minute in global.h. If you find clients get disconnected at the same time, it could be that the Jamulus server is getting locked up. It would be worth checking its memory consumption over time, and particularly when it has stopped working.
I am building a server with this change, and some extra debug output, to see.
I am rather suspicious of this code: https://github.com/jamulussoftware/jamulus/blob/cb5a880ecf0de5e0ac052e133c341a1237c3be0b/src/util.cpp#L760-L774
I think deleteLater() is relying on an event loop to perform the deletion. When our ParseNetworkAddressSrv() resolver is called from the connect dialog, that may well happen ok, although since this function will only be called once when connecting, it will be light on resources anyway. However, when called by a headless server, it seems plausible that the deleteLater queue might not get serviced in the same way. And that resolver function gets called repeatedly, once for each registration refresh.
However, at the moment, that is only an educated guess.
I think it's something different from what I thought above. I ran two servers - one headless and one GUI - on the same machine, both registered to private.jamulus.io with a refresh interval of 1 minute. They both lasted about an hour or so before they stopped registering. I then found that they were both completely unresponsive, probably deadlocked in some way. The GUI server would not respond to any action, and neither server would respond to control-C to terminate, nor a plain kill from the command line. I needed to use kill -9. More investigation needed.
It may be that we just need to try a different implementation of the DNS lookup for SRV. I'm investigating that possibility too.
I modified global.h as suggested and experienced the same thing. I'm at a loss as to further troubleshooting.
@softins Seeing as this isn't working as expected, shall I close this MR?
I modified
global.has suggested and experienced the same thing. I'm at a loss as to further troubleshooting.@softins Seeing as this isn't working as expected, shall I close this MR?
I think you should leave it open, but set it to draft. There is nothing wrong with your change; the problem is deeper.
I have been looking at this recently, and found there is something in Qt's QDnsLookup that locks up on exactly the 60th invocation. I haven't yet found why, but have had detailed conversations with ChatGPT about possible changes in approach.
I'm still looking at it as time permits
Fair enough. I have converted to draft, thanks.