feat: Request stop of running rules
Description
The main focus of this PR is adding the ability to stop an in-progress rule handler.
Base Task updates
To support this functionality, I have updated the base task to allow aborting of execution using an AbortController. The executeTask method gets passed an AbortSignal, which can be used to terminate early at sensible points. In the rule executors case, this is during the rule comparison stage where we spend most of the execution time, and where we do not commit changes until the very end.
I have also updated how execute gets called to remove duplicate logic and the super calls within the tasks. Additionally, I fixed a bug where updating the cron schedule would set the isRunning flag to false, even if the task was running at the time. This could have lead to multiple executions of the same task.
Graceful shutdown
I have also ensured the server shuts down gracefully by enabling shutdown hooks. The base task now implements onApplicationShutdown, and here we attempt to abort the task execution using the very same AbortController. We wait for the task to complete before resolving the promise to pause termination. This change should decrease the likelihood of data corruption, as terminating in some areas of the rule executor can lead to bad times.
Whilst testing this change, I came across some instances where the server would not terminate, despite all tasks being terminated. This was due to persistent active connections. I fixed this by doing two things:
- Using nestjs-graceful-shutdown to track and gracefully terminate any active connections. It does this using http-terminator and shutdown hooks.
- Updating the controllers that house SSE endpoints to close all active client connections on beforeApplicationShutdown.
Supervisord changes
I have also updated the supervisord configuration. The first change is to extend the time before SIGKILL is called from 10 seconds (the default) to 60 seconds. This should give any tasks enough time to finish up. Secondly, I have added a [group] section, which ensures that both applications are sent signals at the same time. I was having issues with this and came across this GitHub comment suggesting to do this.
Next.js changes
If you're on the app and it stops or restarts, the client will attempt to reconnect to the SSE endpoints. If the UI app is available but not the server, the proxied requests from the UI app will fail and it can be a bit spammy. I have wrapped the event and log SSE proxy endpoints in a try-catch to log a more sensible error message.
Checklist
- [x] I have read the CONTRIBUTING.md document.
- [x] I have performed a self-review of my code.
- [x] I have linted and formatted my code.
- [x] My changes generate no new warnings.
- [x] New and existing unit tests pass locally with my changes.
How to test
Please describe the steps to test your changes, including any setup required.
- Use the PR Docker tag for testing, get the container logs up so you can see what's happening
- Create a rule that takes some time to complete
- Run rules, wait 5 seconds so it gets going
- Click
Stop rules - Among the logs you should see these logs lines shortly after clicking stop rules:
[maintainerr] | 21/04/2025 23:48:34 [INFO] [RuleExecutorService] Stopping the Rule Handler task...
[maintainerr] | 21/04/2025 23:48:35 [INFO] [RuleExecutorService] Task Rule Handler stopped
Repeat steps 1, 2, 3 and then: 4. Stop the Docker container, it should exit shortly after 5. Refer to step 5 above.
Additional context
This change provides one of the options to fix the issue described here: https://github.com/jorenn92/Maintainerr/issues/1589
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
Stopping running rules worked correctly and as expected.
Stopping the container while rules are running worked as well, but I received this in my container log. Figured I would share in case you see something wrong here.
maintainerr_pretest | 2025-05-20 14:05:26,491 WARN received SIGTERM indicating exit request
maintainerr_pretest | 2025-05-20 14:05:26,492 INFO waiting for server, ui to die
maintainerr_pretest | 2025-05-20 14:05:29,495 INFO waiting for server, ui to die
maintainerr_pretest | [maintainerr] | 20/05/2025 10:05:31 [INFO] [RuleExecutorService] Stopping the Rule Handler task...
maintainerr_pretest | ⨯ [Error: failed to pipe response] {
maintainerr_pretest | [cause]: [TypeError: terminated] {
maintainerr_pretest | [cause]: [Error [SocketError]: other side closed] {
maintainerr_pretest | code: 'UND_ERR_SOCKET',
maintainerr_pretest | socket: {
maintainerr_pretest | localAddress: '::1',
maintainerr_pretest | localPort: 36310,
maintainerr_pretest | remoteAddress: '::1',
maintainerr_pretest | remotePort: 3001,
maintainerr_pretest | remoteFamily: 'IPv6',
maintainerr_pretest | timeout: undefined,
maintainerr_pretest | bytesWritten: 452,
maintainerr_pretest | bytesRead: 7884
maintainerr_pretest | }
maintainerr_pretest | }
maintainerr_pretest | }
maintainerr_pretest | }
maintainerr_pretest | ⨯ [Error: failed to pipe response] {
maintainerr_pretest | [cause]: [TypeError: terminated] {
maintainerr_pretest | [cause]: [Error [SocketError]: other side closed] {
maintainerr_pretest | code: 'UND_ERR_SOCKET',
maintainerr_pretest | socket: {
maintainerr_pretest | localAddress: '::1',
maintainerr_pretest | localPort: 36310,
maintainerr_pretest | remoteAddress: '::1',
maintainerr_pretest | remotePort: 3001,
maintainerr_pretest | remoteFamily: 'IPv6',
maintainerr_pretest | timeout: undefined,
maintainerr_pretest | bytesWritten: 452,
maintainerr_pretest | bytesRead: 7884
maintainerr_pretest | }
maintainerr_pretest | }
maintainerr_pretest | }
maintainerr_pretest | }
maintainerr_pretest | 2025-05-20 14:05:31,550 INFO stopped: ui (exit status 0)
maintainerr_pretest | [maintainerr] | 20/05/2025 10:05:32 [INFO] [RuleExecutorService] Task Rule Handler stopped
maintainerr_pretest | 2025-05-20 14:05:32,526 INFO waiting for server to die
maintainerr_pretest | 2025-05-20 14:05:32,571 WARN stopped: server (terminated by SIGTERM)
maintainerr_pretest exited with code 0
Thank you for testing 😄 The error is harmless as the server app is killing off any active connections, and these are proxied via the UI app so the error comes from there. This will be the /api/events/stream endpoint as it's a persistent connection. I tried to catch these before and output a less scary log line, but for whatever reason they got through. I'll have another play around with it later and see if I missed something obvious.
/release-pr
Released to jorenn92/maintainerr:pr-1735 :rocket:
@ydkmlt84 I couldn't figure the logs out. I think they're coming from within Next.js. They're harmless either way.
:tada: This PR is included in version 2.16.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket: