RMS icon indicating copy to clipboard operation
RMS copied to clipboard

The main capture process dies (memory issue?) but EvenMonitor keeps going

Open dvida opened this issue 1 year ago • 7 comments

Perhaps we should modify EventMonitor to also monitor the main capture thread and restart the capture if it's not running.

2024/09/25 06:20:07-INFO-Reprocess-line:327 - Plotting field sums...
2024/09/25 06:21:06-DEBUG-shutil-line:1039 - changing into '/home/rms/RMS_data/CapturedFiles/MA0002_20240924_184839_290063/Fieldsums'
2024/09/25 06:21:06-INFO-shutil-line:899 - Creating tar archive
2024/09/25 06:21:10-DEBUG-shutil-line:1067 - changing back to '/home/rms/source/RMS'
2024/09/25 06:21:10-INFO-Reprocess-line:348 - Making a flat...
CALSTARS file: CALSTARS_MA0002_20240924_184839_290063.txt loaded!
Using 200 files for flat...
/home/rms/Desktop/RMS_StartCapture.sh: line 22:  2905 Killed                  python -m RMS.StartCapture "$@"
Press any key to continue... 2024/09/25 06:45:06-INFO-EventMonitor-line:2143 - Next EventMonitor run : 07:15:06 UTC; 30.0 minutes from now
2024/09/25 06:45:06-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 07:15:07-INFO-EventMonitor-line:2143 - Next EventMonitor run : 07:45:07 UTC; 30.0 minutes from now
2024/09/25 07:15:07-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 07:45:08-INFO-EventMonitor-line:2143 - Next EventMonitor run : 08:15:08 UTC; 30.0 minutes from now
2024/09/25 07:45:08-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 08:15:08-INFO-EventMonitor-line:2143 - Next EventMonitor run : 08:45:08 UTC; 30.0 minutes from now
2024/09/25 08:15:08-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 08:45:09-INFO-EventMonitor-line:2143 - Next EventMonitor run : 09:15:09 UTC; 30.0 minutes from now
2024/09/25 08:45:09-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 09:15:10-INFO-EventMonitor-line:2143 - Next EventMonitor run : 09:45:10 UTC; 30.0 minutes from now
2024/09/25 09:15:10-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 09:45:11-INFO-EventMonitor-line:2143 - Next EventMonitor run : 10:15:11 UTC; 30.0 minutes from now
2024/09/25 09:45:11-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 10:15:12-INFO-EventMonitor-line:2143 - Next EventMonitor run : 10:45:12 UTC; 30.0 minutes from now
2024/09/25 10:15:12-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 10:45:13-INFO-EventMonitor-line:2143 - Next EventMonitor run : 11:15:13 UTC; 30.0 minutes from now
2024/09/25 10:45:13-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 11:15:14-INFO-EventMonitor-line:2143 - Next EventMonitor run : 11:45:14 UTC; 30.0 minutes from now
2024/09/25 11:15:14-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 11:45:14-INFO-EventMonitor-line:2143 - Next EventMonitor run : 12:15:14 UTC; 30.0 minutes from now
2024/09/25 11:45:14-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 12:15:15-INFO-EventMonitor-line:2143 - Next EventMonitor run : 12:45:15 UTC; 30.0 minutes from now
2024/09/25 12:15:15-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 12:45:16-INFO-EventMonitor-line:2143 - Next EventMonitor run : 13:15:16 UTC; 30.0 minutes from now
2024/09/25 12:45:16-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 13:15:17-INFO-EventMonitor-line:2143 - Next EventMonitor run : 13:45:17 UTC; 30.0 minutes from now
2024/09/25 13:15:17-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 13:45:18-INFO-EventMonitor-line:2143 - Next EventMonitor run : 14:15:18 UTC; 30.0 minutes from now
2024/09/25 13:45:18-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 14:15:19-INFO-EventMonitor-line:2143 - Next EventMonitor run : 14:45:19 UTC; 30.0 minutes from now
2024/09/25 14:15:19-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 14:45:20-INFO-EventMonitor-line:2143 - Next EventMonitor run : 15:15:20 UTC; 30.0 minutes from now
2024/09/25 14:45:20-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 15:15:21-INFO-EventMonitor-line:2143 - Next EventMonitor run : 15:45:21 UTC; 30.0 minutes from now
2024/09/25 15:15:21-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 15:45:22-INFO-EventMonitor-line:2143 - Next EventMonitor run : 16:15:22 UTC; 30.0 minutes from now
2024/09/25 15:45:22-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 16:15:23-INFO-EventMonitor-line:2143 - Next EventMonitor run : 16:45:23 UTC; 30.0 minutes from now
2024/09/25 16:15:23-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 16:45:24-INFO-EventMonitor-line:2143 - Next EventMonitor run : 17:15:24 UTC; 30.0 minutes from now
2024/09/25 16:45:24-INFO-EventMonitor-line:2147 - Next Capture start    : 18:47:20 UTC
2024/09/25 17:15:25-INFO-EventMonitor-line:2143 - Next EventMonitor run : 17:45:25 UTC; 30.0 minutes from now
2024/09/25 17:15:25-INFO-EventMonitor-line:2145 - Next Capture start    : 18:47:20 UTC; 91.0 minutes from now
2024/09/25 17:45:26-INFO-EventMonitor-line:2143 - Next EventMonitor run : 18:15:26 UTC; 30.0 minutes from now
2024/09/25 17:45:26-INFO-EventMonitor-line:2145 - Next Capture start    : 18:47:20 UTC; 61.0 minutes from now
2024/09/25 18:15:27-INFO-EventMonitor-line:2143 - Next EventMonitor run : 18:45:27 UTC; 30.0 minutes from now
2024/09/25 18:15:27-INFO-EventMonitor-line:2145 - Next Capture start    : 18:47:20 UTC; 31.0 minutes from now
2024/09/25 18:45:28-INFO-EventMonitor-line:2143 - Next EventMonitor run : 19:15:28 UTC; 30.0 minutes from now
2024/09/25 18:45:28-INFO-EventMonitor-line:2145 - Next Capture start    : 18:47:20 UTC; 1.0 minutes from now
2024/09/25 19:15:29-INFO-EventMonitor-line:2151 - Next EventMonitor run : 19:45:29 UTC 30.0 minutes from now
2024/09/25 19:45:30-INFO-EventMonitor-line:1152 - Added event at 20240925_154025 to the database
2024/09/25 19:45:30-INFO-EventMonitor-line:1879 - Checks on trajectories for event at 20240925_154025
2024/09/25 19:45:30-INFO-EventMonitor-line:1889 - No files for event - marking 20240925_154025 as processed
2024/09/25 19:45:30-INFO-EventMonitor-line:1178 - Event at 20240925_154025 marked as processed
2024/09/25 19:45:30-INFO-EventMonitor-line:2037 - 1 event was processed, EventMonitor work completed
2024/09/25 19:45:30-INFO-EventMonitor-line:2151 - Next EventMonitor run : 20:15:30 UTC 30.0 minutes from now

dvida avatar Sep 29 '24 20:09 dvida

Could be done, but needs care to avoid a recursion trap.

g7gpr avatar Sep 29 '24 21:09 g7gpr

So on plaunch EventMonitor gets passed the pid of Start Capture, and the command line args. Monitors the process, and relaunches if this process dies, but adding a command line switch not to start EventMonitor?

g7gpr avatar Oct 02 '24 08:10 g7gpr

Either that (i.e. we resurrect the main thread), or we restart the whole thing. I think restarting everything might be a better idea as it will give a fresh start.

dvida avatar Oct 02 '24 22:10 dvida

You mean reboot?

g7gpr avatar Oct 03 '24 08:10 g7gpr

Can we use the 'StartCapture -r'? If not, during the default restart, the RMS will try to reprocess the so far captured data, and send out the archive to GMN. I think only the first archive from each night is being processed by the GMN solver, therefore the next capture will be normally ignored by the solver.

čt 3. 10. 2024 v 10:56 odesílatel David Rollinson @.***> napsal:

You mean reboot?

— Reply to this email directly, view it on GitHub https://github.com/CroatianMeteorNetwork/RMS/issues/437#issuecomment-2390881553, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIU5BYVCSOFTWEFDVRNDYDZZUBFNAVCNFSM6AAAAABPCA5LYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJQHA4DCNJVGM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

satmonkey avatar Oct 03 '24 09:10 satmonkey

Auto reprocess will not start if capture should be running. Server will process captures that do not overlap, though dvida could confirm. But I like -r

g7gpr avatar Oct 03 '24 09:10 g7gpr

I mean if the restart would be finally triggered.

čt 3. 10. 2024 v 11:11 odesílatel David Rollinson @.***> napsal:

Auto reprocess will not start if capture should be running. Server will process captures that do not overlap, though dvida could confirm. But I like -r

— Reply to this email directly, view it on GitHub https://github.com/CroatianMeteorNetwork/RMS/issues/437#issuecomment-2390911296, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABIU5BZGDDY45VBRYBM4NEDZZUC5BAVCNFSM6AAAAABPCA5LYKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJQHEYTCMRZGY . You are receiving this because you commented.Message ID: @.***>

satmonkey avatar Oct 03 '24 13:10 satmonkey