cx_Freeze icon indicating copy to clipboard operation
cx_Freeze copied to clipboard

Unhandled python exceptions do not result in non-zero exit codes in Win32Services

Open darad opened this issue 4 years ago • 4 comments

We have a python program cx-frozen into a Win32Service. The service must be running all the time and needs to recover from crashes automatically.

In face of unhandled python exceptions (ie fatal error), the running service seems to exit with a zero exit code leading the service state to STOPPED state. This render any of the Windows Service Recovery strategies ineffective as they only respond to non-zero ExitCodes.

As a workaround, we have resorted to a secondary service that's always checking the state of the frozen service. If it's in STOPPED state, it restarts it. That's an extra layer of management we need to maintain.

Can the service api change in a way that the exit codes reflects more closely the outcome of python program's execution?

darad avatar Jan 26 '21 13:01 darad

I think so. You can put more details like a sample? Do your service is based on the sample in the repository? Do you use the initialize, stop, run and state_chaged in the Handler class?

marcelotduarte avatar Jan 26 '21 16:01 marcelotduarte

Hi Marcel,

I spent some time today replicating this.

Using latest release ( 6.3 ) + the service sample i did the following.

  • Replaced Run with:
def Run(self):
        raise RuntimeError("Doomsday")
        self.stopEvent.set()
  • Built the service as usual + installed it via admin cmd prompt
  • I set the recovery options of the service image
  • I start the service by right clicking > start
  • The service immediately stops
  • Checking the log there is the one error
[15008] 2021/02/09 10:29:45.078 starting logging at level ERROR
[15008] 2021/02/09 10:29:45.143 Python exception encountered:
[15008] 2021/02/09 10:29:45.143     Internal Message: exception running service
[15008] 2021/02/09 10:29:45.144     Type => <class 'RuntimeError'>
[15008] 2021/02/09 10:29:45.144     Value => Doomsday
[15008] 2021/02/09 10:29:45.144     Traceback (most recent call last):

[15008] 2021/02/09 10:29:45.144       File "C:\King\repos\git\WinServiceTesting\cx_sample\service\ServiceHandler.py", line 27, in Run
    raise RuntimeError("Doomsday")

[15008] 2021/02/09 10:29:45.146     RuntimeError: Doomsday

[04308] 2021/02/09 10:29:45.147 ending logging
  • Windows makes no attempt to restart the service

In the scenario of an unhandled exception being raised in any of the Services methods ( Install / Start / Stop / Run / state/session change ) i would expect the process to exit with a None-Zero exit code so the windows recovery options can react to the services death.

Having a quick look at the underlying implementation it looks like it will always return 0....

I hope that all makes sense / please say if i we can do anything to help out! And obligatory - thank you so much for your work on this awesome package!

TechnicalPirate avatar Feb 08 '21 16:02 TechnicalPirate

it always returns 0

Yes, when the service run is ok. The second line (and following lines) of the log, are output of: https://github.com/marcelotduarte/cx_Freeze/blob/f5be487e5db58d2a482b272313e6809a06b7e3d0/source/bases/Win32Service.c#L546

And LogPythonException returns -1 https://github.com/anthony-tuininga/cx_Logging/blob/master/src/cx_Logging.c#L1529-L1546

to https://github.com/marcelotduarte/cx_Freeze/blob/f5be487e5db58d2a482b272313e6809a06b7e3d0/source/bases/Win32Service.c#L579-L583 Then you should use state_changed to catch this, and in Config.py it should be set to true (defaults to false) https://github.com/marcelotduarte/cx_Freeze/blob/f5be487e5db58d2a482b272313e6809a06b7e3d0/cx_Freeze/samples/service/Config.py#L45

marcelotduarte avatar Feb 10 '21 05:02 marcelotduarte

Hi Marcel

I tried ( i think! ) your suggestions but am still finding it always exits cleanly and doesn't trigger any recovery actions.

(btw When searching this git project i can't find any references to state_changed anywhere....)

I added a logger to the Handler class to try your suggestion, please find the modified service sample below :)

With this + SESSION_CHANGES set to True - i never see any logs from StateChanged / state_changed or / SessionChanged*( in this sample - in our main product SessionChanged works as expected for logging events like user logins etc )

import threading
import katlog
import os

def _get_logger():
    _log_name = "cx_service_test"
    logger_obj = katlog.get_template_kat_logger(
        _log_name,
        output_directory=katlog.get_kat_tool_logs_subdir(_log_name),
        file_name=_log_name,
        log_file_prefix="pid{}".format(os.getpid()),
    )
    logger_obj.info("LOGGER Created '{}'".format(_log_name))
    return logger_obj


class Handler(object):

    LOGGER = None

    def __init__(self):
        self.stopEvent = threading.Event()
        self.stopRequestedEvent = threading.Event()

    def Initialize(self, configFileName):

        Handler.LOGGER = _get_logger()
        Handler.LOGGER.debug("WinService.Initialize: Loading Virtual Config: {}".format(configFileName))
        Handler.LOGGER.debug("WinService.Initialize: Creating Instance")

    def Run(self):
        Handler.LOGGER.debug("WinService.Run: About To Explode")
        raise RuntimeError("Doomsday")
        Handler.LOGGER.debug("WinService.Run: After Explosion")
        self.stopEvent.set()

    def Stop(self):
        Handler.LOGGER.debug("WinService.Stop: About To Stop")
        self.stopRequestedEvent.set()
        self.stopEvent.wait()

    def StateChanged(self, *args, **kwargs):
        Handler.LOGGER.debug("WinService.StateChanged: State Changed")
        Handler.LOGGER.debug("{}, {}".format(args, kwargs))

    def state_changed(self, *args, **kwargs):
        Handler.LOGGER.debug("WinService.state_changed: State Changed")
        Handler.LOGGER.debug("{}, {}".format(args, kwargs))

    def SessionChanged(self, *args, **kwargs):
        Handler.LOGGER.debug("WinService.SessionChanged: {} {}".format(args, kwargs))

Log Output...

2021-02-10 10:51:50,149: INFO: katLogging: katLogging.py: MainThread: cx_service_test: get_template_kat_logger: 59: Created new Logger cx_service_test
2021-02-10 10:51:50,150: INFO: ServiceHandler: ServiceHandler.py: MainThread: cx_service_test: _get_logger: 22: LOGGER Created 'cx_service_test'
2021-02-10 10:51:50,150: INFO: __init__: __init__.py: MainThread: cx_service_test: enable_api_logger: 25: Connected katapi logger.
2021-02-10 10:51:50,150: DEBUG: ServiceHandler: ServiceHandler.py: MainThread: cx_service_test: Initialize: 40: WinService.Initialize: Loading Virtual Config: C:\King\repos\git\WinServiceTesting\cx_sample\service\build\exe.win32-3.5\cx_FreezeSampleService.ini
2021-02-10 10:51:50,150: DEBUG: ServiceHandler: ServiceHandler.py: MainThread: cx_service_test: Initialize: 41: WinService.Initialize: Creating Instance
2021-02-10 10:51:50,151: DEBUG: ServiceHandler: ServiceHandler.py: MainThread: cx_service_test: Run: 50: WinService.Run: About To Explode

Side note.... Re:

 // run the service 
 if (Service_Run(&info) < 0) { 
     Service_SetStatus(&info, SERVICE_STOPPED); 
     return; 
 } 

Wouldn't this return 0/None/Null?

TechnicalPirate avatar Feb 10 '21 10:02 TechnicalPirate

From a brief reading, it seems that it's possible to set the exit code from a stopped service in the call to Service_SetStatus().

According to the documentation for windows services status, it's possible to use the param dwWin32ExitCode to indicate what the service should report when stopping. Currently, this value is always set to 0.

If the function Service_SetStatus() was given another parameter to indicate the intended exit code, this could likely be fixed. At the moment I'm not able to test, verify and offer a concrete PR for this change - but hopefully, this will make the process easier for the person who's looking into this issue.

johan-ronnkvist avatar Oct 19 '22 07:10 johan-ronnkvist

@darad @TechnicalPirate Can you test the @johron PR?

To install the latest development build: pip install --upgrade --pre --extra-index-url https://marcelotduarte.github.io/packages/ cx_Freeze

marcelotduarte avatar Feb 19 '23 18:02 marcelotduarte

Hey Marcel - thanks for flagging it - I will put it on my TODO list to test.

Thank you for taking this Johron! :)

TechnicalPirate avatar Feb 20 '23 10:02 TechnicalPirate

Hi @marcelotduarte - I spent some time with Johron this week :)

  • We reproduce the issue again on the latest version to confirm nothing had changed in the interim
  • We then restored his PR / brought it up to the latest master
  • Found some issues with it -> which we've now addressed
  • I can now confirm that the PR is fresh & addresses the issue - the expected behavior is now being performed.

TechnicalPirate avatar May 24 '23 07:05 TechnicalPirate

@TechnicalPirate Did you test today's last commit?

marcelotduarte avatar May 24 '23 20:05 marcelotduarte

@marcelotduarte - Yep, we actually ended up on a call to go through and ensure we were both observing the same behaviour prior to making the change -> then confirming service recovery works as expected.

There is one small nuance that's worth commenting on here - even with the latest commit - there is an implicit assumption that the service has to enter a running state for recovery measures to kick-in if the service process encounters an unhandled exception during execution. ( Something we discovered during testing ).

But I can confirm that as of the latest commit a9e0df3 on that PR - everything is tested and working 👍

TechnicalPirate avatar May 24 '23 21:05 TechnicalPirate