event-store-symfony-bundle icon indicating copy to clipboard operation
event-store-symfony-bundle copied to clipboard

Graceful shutdown on interrupt

Open enumag opened this issue 6 years ago • 14 comments

I just encountered an issue where one of my projection was in an invalid state after being interrupted during previous run. The state of the projection was not saved to the database correctly

I believe the projection commands in this bundle could and should catch interruption signals like Ctrl+C and shutdown correctly when it happens using pcntl_signal function. Something like this (by @prolic):

pcntl_signal(SIGINT, function () use ($projection) {
    $projection->stop();
});

I'm not using the projection commands from this bundle at the moment but I'm going to refactor my project to use them soon. I might take a look into this then too.

Any tips what I should be aware of? I never tried working with pcntl_signal function until now.

enumag avatar Apr 25 '18 10:04 enumag

Unfortunately I've never tried workwing with it too. But this is definitively something we should look for.

UFOMelkor avatar Apr 25 '18 12:04 UFOMelkor

@UFOMelkor it's really as simple as:

pcntl_signal(SIGINT, function () use ($projection) {
    $projection->stop();
});

$projection->run($keepRunning);

prolic avatar Apr 25 '18 13:04 prolic

Note:

Ctrl+C - SIGINT Ctrl+\ - SIGQUIT

prolic avatar Apr 25 '18 13:04 prolic

I looked into this for a little bit. It only works when $options[PdoEventStoreReadModelProjector::OPTION_PCNTL_DISPATCH] is set to true. Currently it's impossible to set it that way because AbstractProjectionCommand doesn't pass any options when calling ProjectionManager::createReadModelProjection().

enumag avatar Apr 26 '18 07:04 enumag

Another thing is that pcntl_signal_dispatch(); doesn't seem to be called often enough. I believe it should be added to the end of ReadModelProjector::persist(). Can you think of any possible problems with that?

enumag avatar Apr 26 '18 08:04 enumag

ping @prolic @UFOMelkor

enumag avatar May 02 '18 07:05 enumag

If you want to handle all scenarios for interrupts and means of termination you will have to add additional signals to listen for (SIGTERM, SIGINT, SIGQUIT, (SIGHUB)). Only listening to SIGINT won't catch all instances of how a process can be forced to quit. The places where the current dipatches are located are quite intentional: before and after one full loop of processing. Having this in between might cause inconsitent read models in some scenarios (depending on how they are implemented).

This is a run wrapper we use on our end:

class ProjectionRunWrapper
{
    /** @var  string */
    private $projectionName;

    /** @var  Query|Projector|ReadModelProjector */
    private $projection;

    /** @var  Logger */
    private $logger;

    public function __construct(string $projectionName, $projection, Logger $logger)
    {
        $this->projectionName = $projectionName;
        $this->projection = $projection;
        $this->logger = $logger;
    }

    public function run(): void
    {
        $this->logger->info(
            sprintf('Starting projection \'%s\'', $this->projectionName),
            ['projection' => $this->projectionName, ]
        );

        $this->registerSignalHandlers();

        try {
            $this->projection->run();
        } catch (\Throwable $e) {
            $this->logger->err(
                $e->getMessage(),
                ['projection' => $this->projectionName, 'exception' => $e, ]
            );
            $this->finalize();
        }
    }

    private function registerSignalHandlers(): void
    {
        $pcntlCallback = function () { $this->handleSignals(); };

        pcntl_signal(SIGHUP, $pcntlCallback);
        pcntl_signal(SIGTERM, $pcntlCallback);
        pcntl_signal(SIGINT, $pcntlCallback);
        pcntl_signal(SIGQUIT, $pcntlCallback);
    }

    private function handleSignals(): void
    {
        $this->logger->info(
            sprintf('Attempting graceful shutdown of projection \'%s\'', $this->projectionName),
            ['projection' => $this->projectionName, ]
        );
        if (null === $this->projection) {
            $this->logger->err(
                sprintf('Projection \'%s\' not started', $this->projectionName).
                ['projection' => $this->projectionName, ]
            );
            return;
        }
        $this->finalize();
    }

    private function finalize(): void
    {
        $this->logger->info(
            sprintf( 'Projection \'%s\' stopped', $this->projectionName),
            ['projection' => $this->projectionName, ]
        );
        $this->projection->stop();
        $this->projection = null;
    }
}

fritz-gerneth avatar May 02 '18 07:05 fritz-gerneth

If you want to handle all scenarios for interrupts and means of termination

That's not really the goal here. I just want a fix for regular interrupt.

The places where the current dipatches are located are quite intentional: before and after one full loop of processing. Having this in between might cause inconsitent read models in some scenarios (depending on how they are implemented).

What scenario would be broken by dispatch in the projector persist method? As far as I can tell it seems to be a safe place for it.

This is a run wrapper we use on our end:

I was thinking about something like that too but I still prefer to have it fixed in prooph. Besides this doesn't fix the problem of too long time between dispatch calls which is the main issue.

enumag avatar May 03 '18 07:05 enumag

What scenario would be broken by dispatch in the projector persist method? As far as I can tell it seems to be a safe place for it.

That's an interrupt between applying the data and saving the information this data has been applied. As this is not running within an transaction, quiting before saving the state would result in an event applied in the read model (e.g. row created) but the state not persisted. E.g. on next start the event would be re-applied again. Now this is all old "at least once" delivery and "make your read models idempotent and can be handled to some extend in SQL directly.

I'm wondering about how did your read model get corrupted? pcntl_signal_dispatch is indeed only invoked after all events from all streams have been processed, and before new ones checked. The only scenario I could think of is that you had to process quite a few events in one batch and some process manager killed the projection because it did not respond to SIGINT in time?

fritz-gerneth avatar May 03 '18 08:05 fritz-gerneth

@fritz-gerneth that's no longer true, see https://github.com/prooph/pdo-event-store/pull/144

prolic avatar May 03 '18 08:05 prolic

@fritz-gerneth the change is released with v1.8.1 (https://github.com/prooph/pdo-event-store/releases/tag/v1.8.1)

prolic avatar May 03 '18 08:05 prolic

And here I was sitting on 1.8.0 :) v1.8.1 very well might solve this issue then :)

fritz-gerneth avatar May 03 '18 08:05 fritz-gerneth

I missed that PR too. That should solve my issues with dispatch then.

Next we need a way to configure the options with this bundle:

Currently it's impossible to set it that way because AbstractProjectionCommand doesn't pass any options when calling ProjectionManager::createReadModelProjection().

enumag avatar May 03 '18 08:05 enumag

I think we should pass options to createReadModelProjection and define them in bundle config/parameters (event_store.xml (?)). Also we should be able to register custom signal handlers (collect them via tag). I suppose signal handlers should be placed in your application but we can place SIGINT handler into the symfony-event-store-bundle by default.

adapik avatar May 15 '18 11:05 adapik