RMS icon indicating copy to clipboard operation
RMS copied to clipboard

Logger - socket exception

Open g7gpr opened this issue 1 year ago • 11 comments

https://github.com/CroatianMeteorNetwork/RMS/blob/4ab05b037ee5ada27cd7df1797e812b7d47b3a53/RMS/Logger.py#L44

On prerelease Pi4, seeing sporadic messages referring to this line.

2024/09/25 09:52:11-INFO-Logger-line:44 - Socket exception: Connection reset by peer (104)

I don't understand what is happening here, apologies for this not being a very helpful submission.

g7gpr avatar Sep 25 '24 04:09 g7gpr

I think the log is doing its job and capturing some sort of error. Any chance you could share the whole log? Is it associated with a video stream disconnect or something?

Cybis320 avatar Sep 25 '24 04:09 Cybis320

2024/09/25 09:10:15-INFO-EventMonitor-line:2144 - Next EventMonitor run : 01:40:15 UTC; 30.0 minutes from now
2024/09/25 09:10:15-INFO-EventMonitor-line:2148 - Next Capture start    : 10:38:49 UTC
2024/09/25 09:40:16-INFO-EventMonitor-line:2144 - Next EventMonitor run : 02:10:16 UTC; 30.0 minutes from now
2024/09/25 09:40:16-INFO-EventMonitor-line:2148 - Next Capture start    : 10:38:49 UTC
2024/09/25 09:52:11-INFO-Logger-line:44 - Socket exception: Connection reset by peer (104)
2024/09/25 10:10:18-INFO-EventMonitor-line:2144 - Next EventMonitor run : 02:40:18 UTC; 30.0 minutes from now
2024/09/25 10:10:18-INFO-EventMonitor-line:2148 - Next Capture start    : 10:38:49 UTC
2024/09/25 10:40:20-INFO-EventMonitor-line:2144 - Next EventMonitor run : 03:10:20 UTC; 30.0 minutes from now
2024/09/25 10:40:20-INFO-EventMonitor-line:2148 - Next Capture start    : 10:38:49 UTC

Not capturing at the moment, unfortunately I can't download the whole log at the moment

g7gpr avatar Sep 25 '24 04:09 g7gpr

oh, interesting. It's no even capturing...

Cybis320 avatar Sep 25 '24 04:09 Cybis320

It could be this bit: https://github.com/CroatianMeteorNetwork/RMS/blob/4ab05b037ee5ada27cd7df1797e812b7d47b3a53/RMS/Formats/ObservationSummary.py#L194-L217

Maybe, wrap it in a try-except block and add a timeout?

def timestampFromNTP(addr='0.us.pool.ntp.org'):
    REF_TIME_1970 = 2208988800  # Reference time
    client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    
    # Set the socket timeout (e.g., 5 seconds)
    client.settimeout(5)
    
    try:
        data = b'\x1b' + 47 * b'\0'
        client.sendto(data, (addr, 123))
        data, address = client.recvfrom(1024)
        if data:
            t = struct.unpack('!12I', data)[10]
            t -= REF_TIME_1970
            return t
    except socket.error as e:
        print("Socket error: {}".format(e))
        return None
    finally:
        client.close()

Cybis320 avatar Sep 25 '24 05:09 Cybis320

Unlikely, that code has no business running at this stage.

g7gpr avatar Sep 25 '24 05:09 g7gpr

Yes, you are right. I see that now. Sorry.

Maybe something around her?: https://github.com/CroatianMeteorNetwork/RMS/blob/4ab05b037ee5ada27cd7df1797e812b7d47b3a53/RMS/EventMonitor.py#L1292

Cybis320 avatar Sep 25 '24 05:09 Cybis320

That code executes at the start of an EventMonitor run, which is not yet due.

g7gpr avatar Sep 25 '24 05:09 g7gpr

So the error is occurring while the program is in this loop: https://github.com/CroatianMeteorNetwork/RMS/blob/4ab05b037ee5ada27cd7df1797e812b7d47b3a53/RMS/EventMonitor.py#L2127

Which calls this function: https://github.com/CroatianMeteorNetwork/RMS/blob/4ab05b037ee5ada27cd7df1797e812b7d47b3a53/RMS/EventMonitor.py#L2084

Which calls: https://github.com/CroatianMeteorNetwork/RMS/blob/4ab05b037ee5ada27cd7df1797e812b7d47b3a53/RMS/EventMonitor.py#L1292

Which runs: https://github.com/CroatianMeteorNetwork/RMS/blob/4ab05b037ee5ada27cd7df1797e812b7d47b3a53/RMS/EventMonitor.py#L1310

Maybe warp it in a with and catch socket error?:

 try:
    if sys.version_info[0] < 3:
        with urllib2.urlopen(self.syscon.event_monitor_webpage) as response:
            web_page = response.read().splitlines()
    else:
        with urllib.request.urlopen(self.syscon.event_monitor_webpage) as response:
            web_page = response.read().decode("utf-8").splitlines()
except (socket.error, urllib2.URLError, urllib.error.URLError) as e:
    log.error("Socket or URL error in getEventsfromWebPage: {}".format(e))
    return events
except Exception as e:
    log.error("Unexpected error in getEventsfromWebPage: {}".format(e))
    return events

Cybis320 avatar Sep 25 '24 05:09 Cybis320

Error was raised at 09:52:11, that code ran successfully at 09:40:15 and was not scheduled to run again until 10:10:18. Why do you think it is that code that had the exception?

g7gpr avatar Sep 25 '24 07:09 g7gpr

A likely cause of the error is that a socket is being opened but not properly closed by the client. If the client fails to close the socket after completing its operation, the server may eventually close the connection after a period of inactivity, which is what the “Connection reset by peer” message suggests. This typically happens when the server forcefully terminates an idle or incomplete connection.

It’s also important to note that there can be a delay between when the socket is opened and when the server closes it. The message may not appear immediately after the socket is opened but rather after the server detects inactivity or hits a timeout, which can occur several minutes later.

It'd be worth trying using a with statement when opening the socket ensures that the connection is properly closed on the client side once the operation is complete.

Cybis320 avatar Sep 25 '24 14:09 Cybis320

Also, I think the code as it is keeps opening a new socket every 30 minutes without ever closing them.

Cybis320 avatar Sep 25 '24 15:09 Cybis320