gluon
gluon copied to clipboard
gluon-online-status concept
As we closed #1930 and #1684 today with a reference to an IRC discussion that was ongoing I want to present the conclusion of that discussion.
We think that having all nodes ping into the world in regular intervals is not something we would like to see in a first implementation of this feature. Instead we would like to focus on information that we can cheaply derive from the local node and offer a multitude of flags to various packages, that they can easily test for to see if the node is or isn't in a required state.
For that we define a directory /var/gluon/online/
that carries empty marker files. With our initial proposal we think of two simple markers that we would like to see in the first version:
-
neighbors
ormesh
to reflect that the node has neighbors that it meshes with -
route_default4/6
ordefault_gw4/6
, to reflect that the respective network stack has a default route
In later versions these can be extended by a multitude of things, I could imagine exposing whether we have an active NTP sync for example. This would also be open to contributions of markers through community packages.
A script should run in a regular fashion, that calls the canonical checks for the mesh protocols provider and based on these touches certain files, should they not exist, or deletes them should they no longer be valid. I think it would be helpful not to touch already existing markers so that we can look at the mtime of these markers to know when they entered a certain state.
For batman-adv we think batctl n
and batctl gwl
are strong contenders, and for babeld something can be grepped from the dump
command of its control socket.
N.B: I'm not sure what state the babel setup is in, and I wouldn't want to make it a mandatory part of the implementation if, as I have recently heard, it doesn't build for multiple releases now and nobody noticed.
@NeoRaider @blocktrron @AiyionPrime @T-X I hope this summarises what we talked about, if not feel free to add your larification below.
I think it would be helpful not to touch already existing markers so that we can look at the mtime of these markers to know when they entered a certain state.
On second thought: we should have ctime to read that information, so touching could be done unconditionally.
Suggestion:
/var/gluon/state/has_neighbours
/var/gluon/state/has_default_gw4
/var/gluon/state/has_default_gw6
(React with: 👍 or 👎)
It's correct that batctl gwj
and batctl nj
are not soon available in gluon/tree/master right?
So how will this solution help creating an SSID-Changer? How can we be sure, that the node can reach the internet?
It's correct that
batctl gwj
andbatctl nj
are not soon available in gluon/tree/master right?
As soon as we are on OpenWrt 21.02, but apparently we have not yet decided to migrate to that after 2021.1, which I honestly can't understand because 19.07 has a projected EOL in august.
So how will this solution help creating an SSID-Changer? How can we be sure, that the node can reach the internet?
@rubo77 Not sure it will. For now I am aiming for the three paths above; a ressource-check using icmp and external servers might be added later, as soon as the current goal proves to be insufficient.
And neither have I forgotten: there was a huge discussion about what an online-checker might check; nor that one of the results was for offline-ssid-changer a ping check is expected to be most useful.
It's just so controversial, that I'd like to start with a featureset we can agree on. There are other usecases, for which the provided cheaper calls might be enough. We can start with them and discuss the implementation of the expensive ones later.
So how will this solution help creating an SSID-Changer?
at the moment there is no way to let batman-gw signalize if it has public ipv6/ipv4 connectivity available through it. as long as this is, the offline ssid package should still test "pinging public internet".
So how will this solution help creating an SSID-Changer?
at the moment there is no way to let batman-gw signalize if it has public ipv6/ipv4 connectivity available through it. as long as this is, the offline ssid package should still test "pinging public internet".
Why not
# untested, but hope you get the gist
IFNAME=bat0 DEST=192.0.0.2 (ping -c4 -I $IFNAME $DEST && batctl meshif $IFNAME gw server 100mbit/100mbit) || batctl meshif $IFNAME gw off
in a cronjob on your gateway, and then we have signaling based on the gateway mode and you even have some sort of failover condition.
There are smarter solutions than having hundreds of nodes ping into the internet all day long.
Hanover does this as well.
Why not
this is what we do, a little more complex, which i consider just a workaround. Nevertheless i assume that there are people arguing that "being a batman-gateway should not be the indicator for offering internet-peering". Or to put a a different way: i wish there would be a dedicated way for a offline-ssid-package to determine if there is internet connectivity available via the batman-network, for example via dns (a dnsbl-alike method)
What is the goal you want to achieve with this gluon-online-status? Better say for what do you want to collect this information?
I think gluon-online-status is a bad naming as it suggests that this can actually check whether the internet is reachable which only a global ping check could and it does not make clear that this package also does neighbor checks etc.
My idea in https://github.com/freifunk-gluon/community-packages/pull/9 is that it is up to the community decide on which level they want to do the test. It is possible to define target groups e.g. local (supernodes, nameservers, timeservers) and global ones (publicly pingable servers, servers from other Freifunk communities or something). The offline-ssid package can then be set to use either the global or local targets depending on if the community thinks it is a bad idea to ping global targets.
My approach could be clearly made more efficient, but it was thought as an RFC right from the beginning. For example one could do a local respondd query to determine if another node in the local mesh cloud has done a connection check successfully so that only one of the nodes does the ping to global targets or something.
I just don't understand why you want to add this whole complexity. From my point of view ping checks are not costly and checking for neighbors and route_default4/6 would only add to complexity without a proper use-case as a ping would not cost anything if these are not given (because it would not reach the internet at all then).
The first third of checks we initially agreed on is now part of master. The other two would (at least for batman-adv) follow in #2274.
Babels implementation is still missing; if there's interest I can write them down as well.
@AiyionPrime What about adding a respondd provider for publishing the gluon-state results?
Personally I do not see a need for that.
I was made aware of the goal to fit repondd responses in a packet.
[...], the data should fit in a single unfragmented packet.[...]
Originally posted by @NeoRaider in https://github.com/freifunk-gluon/gluon/issues/2289#issuecomment-895578315
That's something I did not focus on yet, but will in the future. Other than that such a provider is out of this' issues scope. If you need it, you might want to open another.
What's left to do here is #2274 as well as possibly a pendant for babel. Maybe @mweinelt did come to a conclusion; the initial post questions this in the italic section.
@AiyionPrime A respondd provider for this would be very useful for diagnostics in the future (maybe not the currently implemented) and I think the overhead is negligible as these are one-byte values... Isn't there a possibility to include the requested fields in the query? And if not... Why? E.g. the total of transferred bytes or the hostname etc. is something which doesn't need to be transferred each time or only if it is changed. If respondd is at it's size limit, respondd should be optimized...
I really think query-dependent respondd-responses are out of scope of this.
I'm not sure what check for has_default_gw4
in babel would make sense.
@AiyionPrime I don't have access to a Babel-based mesh at the moment. Could you post the output of a babel dump?
I haven't either; will try to find a working net again.
hm, @mweinelt was, as far as i remember, under the impression that recent Gluon releases don't build or at least don't work with babel, as no one has worked on or at least tested the babel support in the last ~2 years
I haven't found a working net with recent gluon-babel in the past months, so this is kind of stalled by #2353 .
I think we need a solution here soon because the current ssID changer doesn't work anymore with the latest gluon and needs to be adapted
I think we need a solution here soon because the current ssID changer doesn't work anymore with the latest gluon and needs to be adapted
I think the solution has been implemented a while ago. For batman networks the work is done since the merge of #2245 and #2274. So in case your community uses batman, you're already good to go.
In case it does not and uses babel instead, you only have a v6 default route check for now. You can however help get #2297 merged, by testing how well it works.
Code has been done about a year ago, we haven't encountered anybody with a recent babel network yet...