Detect loop in the DNS setup
If the DNS name logger.talent-monitoring.com is resolved to itself by the proxy, then countless connections are established through this loop until the system crashes. This is a misconfiguration of the DNS setup and should be recognized by the proxy and the outgoing connection should not be established. The proxy then runs and at least delivers the data to the MQTT broker.
Do you already have an idea how to figure out if the dns entry resolves to the proxy itself? I assume a dns resolution check would be only be feasible if comparing external/internal dns server responses and users might want to opt out using an external / fixed dns server. Another option would be to call one of its own endpoints (i.e. /-/healthy) on the external dns- if it gets an answer that would be a good indication that we have the bogus condition but would also take longer till the timeout at the cloud endpoint is reached.
Do you see any other/ better approach?
It is really not simple. Until the user can have firewall rules, NAT, Hair-Pin-NAT etc, I would prefer to send a real packet to the well known port 5005 and 10000. So the question is how to mark this packet?
- wrong coding -> this can be dangerous, cause the inverter or the TSUN cloud might crash
- a special msg type or register -> also dangerous
- for GEN3PLUS we can use a different/wrong CRC algorithm, but this will not work for GEN3
- for GEN3 we can set a magic email address in the first packet, which we can easily detect (e.g. <SNR>@test.com according to RFC2606)
So version 3 and 4 are my vavorits.
Your approach of using the HTTP endpoint is very good as we are independent of the inverter protocols. I'm not sure if a standard linux will send a packet for his own IP address to the gw or if it return inside the kernel. In the second case, the test package cannot be discarded externally, which would be great.
But the IP of the docker engine must not the IP for the DNS resolving. I use Hair-Pin-Nat, so I resolve the logger.talent-monitoring.com to the local IP 172.16.30.x which will be forwarded to the docker engine with the IP 172.16.20.y
So this might not work in every case.
Option 1
Lets assume we do a request to logger.talent-monitoring:8127/-/ready In case we have a bogus and nothing prevents the connection we get a "Is ready". In case we get a timeout we don't know if its the real one or something blocking the request (firewall)
- [x] - Upside: no harm to tsun-cloud as the port is blocked
- [ ] - Downside: not all circumstances can be detected
Option 2
Lets assume we send a packet (option 3 or 4 or checksum i/o) to the cloud endpoint on either port 5000 or 10000 In case we have a bogus the proxy recognizes the crc, mail or checksum (more on that idea later, no idea if its feasible) In case we don't have a bogus the tsun clouds needs do digest the wrong packet and we need to hope nothing breaks there
- [x] - Upside: highest confidence in bogus detection as this should work for all cases
- [ ] - Downside: Might break something on the cloud end
Idea: Checksum Workflow
-
Proxy receives first packet from the inverter
-
Proxy creates a checksum of the packet
-
Proxy tries to send this packet to the cloud
-
Proxy checks if packets 2-n received "from the inverter" have the same checksum and assumes bogus situation if true
-
[x] Upside: no harm to tsun-cloud
-
[ ] Downside: not sure if there are identical packets coming in under normal situations
If we have a loop, we will get a lot of new connections from the same endpoint with the same inverter serial number. That can be normal, if the inverter have detected a problem and establishes a new connection to solve this. I think that in this case the time between the two connection should be more than a seconds (Must be a timeout...) If the time between the connection is very small (a few ms) than it might be a loop. The first outgoing connection from the proxy may need a DNS resolving. After this is done, the loop will be very fast.
Detection:
- there a at least 3 connection from inverters with the same serial number
- and, the time between the second and third connection establishment is very short
What do you think about this approach?
I'll come back to this topic, as there are always problems with the DNS setup. I have now simply installed a simple test that checks whether the resolver has returned a private address for the cloud FQDNs. If this is the case, then this is certainly not the correct address and most probably even points to the proxy. In this case, I can even switch off forwarding in the current config
implemented with #256