fah-client-bastet icon indicating copy to clipboard operation
fah-client-bastet copied to clipboard

Log not rotated on Windows, with no new log

Open Hou5e opened this issue 1 year ago • 15 comments
trafficstars

At 0 UTC (5pm PST), the log rotate is initiated, but FAH v8.3.9 on Windows doesn't move to the new log file. @kbernhagen reported that this also happened for FAH v8.3.8 on Windows.

The log shown below is still at: C:\ProgramData\FAHClient\log.txt And, was not moved and renamed with the date to the 'logs' folder.

End of the log looks like this 3 hours later while still folding:

23:56:15:I1:WU1626:Checkpoint completed at step 1050000
23:57:18:I1:WU1626:Completed 1062500 out of 1250000 steps (85%)
23:58:19:I1:WU1626:Completed 1075000 out of 1250000 steps (86%)
23:58:21:I1:WU1626:Checkpoint completed at step 1075000
23:59:23:I1:WU1626:Completed 1087500 out of 1250000 steps (87%)
*********************** Log Rotated 2024-05-08T00:00:24Z ***********************

The web UI looks like this 3 hours later while folding: image

image

Hou5e avatar May 08 '24 03:05 Hou5e

It had the same results for with FAH v8.3.10, on Windows for this issue. The log didn't rotate and the last log entry was 5 hours ago just before 0 UTC while it's still folding.

23:55:38:I1:WU1633:Completed 75000 out of 1250000 steps (6%)
23:55:44:I1:WU1633:Checkpoint completed at step 75000
23:56:38:I1:WU1633:Completed 87500 out of 1250000 steps (7%)
23:57:32:I1:WU1633:Completed 100000 out of 1250000 steps (8%)
23:57:38:I1:WU1633:Checkpoint completed at step 100000
23:58:33:I1:WU1633:Completed 112500 out of 1250000 steps (9%)
23:59:27:I1:WU1633:Completed 125000 out of 1250000 steps (10%)
23:59:33:I1:WU1633:Checkpoint completed at step 125000
*********************** Log Rotated 2024-05-09T00:00:27Z ***********************

image

Hou5e avatar May 09 '24 05:05 Hou5e

Is this from the actual log file or from Web Control's view of it?

jcoffland avatar May 09 '24 13:05 jcoffland

Both, that is what was trying to show. That you can't see anything past 0 UTC in the GUI, and the entire log file for both days looks like this: FAH v8.3.10: log-20240509-190042.txt

FAH v8.3.9: log-20240508-191453.txt

I'll try running the Debug version today to see if there are any clues with that version.

Hou5e avatar May 09 '24 19:05 Hou5e

Those log files look normal. log.txt is moved to logs/log-YYYYMMDD-HHMMSS.txt. The last line is the "Log Rotated" line. Then it should continue with a new log.txt. Are you saying there's no new log.txt?

jcoffland avatar May 09 '24 19:05 jcoffland

The filenames only look normal because I closed FAH, and restarted it, where the log.txt is rotated properly on FAH startup (this restart happened each day to update to the new version of FAH). If I had left FAH running then the file would still be log.txt in the main data folder (not logs/log-YYYYMMDD-HHMMSS.txt), and there would be no new log data after 0 UTC (in the UI or the log file).

Hou5e avatar May 09 '24 21:05 Hou5e

In the initial startup of the Windows FAH log header, is the $( __ $) supposed to be there in the Options: for the Client and CBang sections?

Example from Client section: Options: $( /TP $) /std:c++14 /nologo /EHa /wd4297 /wd4103 /W1 /MTd


Running with a 10 minute log rotate with the Debug version, I didn't see anything new or interesting. The log just stops at the log rotate like the other examples. I stopped and restarted FAH to get it to rotate and make the filename with the date: log-20240510-085635.txt

Hou5e avatar May 10 '24 09:05 Hou5e

Got. That's what I needed to know. So to summarize, on Windows:

  • At log rotation time:
    • The "Log Rotated" message is printed to the log.
    • log.txt is not moved.
    • All logging to log.txt stops and it never recovers.
  • At client start up, log.txt is rotated into logs/ and logging proceeds normally until the next rotation.
  • If you change the log rotation rate then the same bug occurs but at the new rotation time.
  • This started occurring from v8.3.8 on.

jcoffland avatar May 10 '24 11:05 jcoffland

Yep, That's a good summary of the issue.

There are some minor odd side-effects seen with the short log rotation. I restarted FAH to get the next log (this one is not interesting): log-20240510-093235.txt

I restarted FAH again and there were ~3 log rotations before 'Finishing' the WU. This was what was at the end. When the WU finished the log was wiped out with the finishing info (not rotated): log-20240510-120305.txt And, the UI looked like this on the PC that 'Finished' the WU, but FAH was idle and should not say 'Disconnected' for Win10-PC: image

Note: one of the other 'Disconnected' PCs was running: X99-12, and looked like this locally on that PC: image

Hou5e avatar May 10 '24 12:05 Hou5e

For FAH v8.3.11 Build183 (Debug) on Windows, there was a useful error in the GUI log messages, and it didn't get saved to the log.txt file. I missed capturing it from the GUI at 0 UTC when I first saw it, but got it with a 10 minute log rotate:

01:39:22:I1:WU1649:Completed 1112500 out of 1250000 steps (89%)
01:41:05:E :Exception: Failed to rename 'log.txt' to 'logs\log-20240512-014105.txt': The process cannot access the file because it is being used by another process.
01:41:05:E :
01:41:05:E : At: src\cbang\os\SystemUtilities.cpp:729:rename()
01:41:05:E :Caught at: src\cbang\event\Event.cpp:137:call()
01:41:06:E :WU1649:Stream error: 3 

log-20240512-014106_GUI_Text.txt

For comparison, the same section written to the log file looks like this:

01:39:22:I1:WU1649:Completed 1112500 out of 1250000 steps (89%)
*********************** Log Rotated 2024-05-12T01:41:05Z ***********************


If FAH isn't folding (like it was Finishing a WU) then it can rotate the log: GUI messages from the screen: log-20240512-022400_GUI_Text.txt

Log files of the same events: Part 1: log-20240512-022239.txt Part 2: log-20240512-023044.txt

Hou5e avatar May 12 '24 01:05 Hou5e

I finally figured this out. The problem was due to a change in the file handling. If you use the standard C++ std::fstream on Windows there is no easy way to prevent the file handle from being inherited by subprocesses. So when the client spawns the core subprocess it gets a copy of the log file handle. Later, if the core is still running, if you close the log file and try to rotate it, it fails with an access error because the core process still has an open file handle to the log.

The solution was to implement a custom std::streambuf which sets the correct flags when opening the file which prevent the file handle from being inherited by subprocesses. This is actually how it used to work in cbang but I tried to simplify the code which lead to several difficult bugs.

jcoffland avatar May 13 '24 07:05 jcoffland

For FAH v8.3.11 Alpha (Debug) on Windows, you get 2 error messages on FAH startup where it crashes quickly with: image image

Last line of the log: 20:49:52:E :Thread 1 deallocated while still active log-20240513-205115.txt

Hou5e avatar May 13 '24 20:05 Hou5e

Also, both Windows installers {Release & Debug} have the debug browse information file included https://github.com/FoldingAtHome/fah-client-bastet/blob/3917839146eac0e30079900cef812224ce35a2be/install/win/fah-client.nsi#L209 that makes the installer ~3x larger. This should probably be put back to the way it was once these issues get resolved. Or, include the .pdb file for the Debug version, and not include it for the Release version.

The uninstaller section of the script should make sure it deletes that file: Add: Delete "$INSTDIR\FAHClient.pdb" below this line: https://github.com/FoldingAtHome/fah-client-bastet/blob/3917839146eac0e30079900cef812224ce35a2be/install/win/fah-client.nsi#L346

Hou5e avatar May 14 '24 10:05 Hou5e

For FAH v8.3.13 Alpha (Debug) on Windows, it looks like the log rotate is working well and happened at 0UTC. I turned the log rotate on for 10 minute intervals after that, where it has rotated about 50 times while I was testing it.

Hou5e avatar May 16 '24 11:05 Hou5e

[It works too well now... 😄 ] I went back to the default once a day log rotate with the quick WUs turned on with FAH v8.3.13 Alpha (Debug) on Windows there was enough log activity to get (2) log rotations at 0UTC (not sure it it was coincidental or if 2 threads are both trying to do the log rotation??). I truncated the 22MB log to the first few and last few entries: log-20240516-235959.txt Second Log at 0UTC: log-20240517-000000.txt

Hou5e avatar May 17 '24 06:05 Hou5e

This is unrelated. The rotation triggered one second early due to a rounding error. This caused it to trigger again at the correct time. I found and fixed this issue for the next release.

jcoffland avatar May 20 '24 10:05 jcoffland