sdk-php
sdk-php copied to clipboard
[Bug] Got the response to undefined request
Describe the bug
Sometimes when a child worker process throws an exception, the parent worker process throws the following panic error:
PanicError: flush queue: SoftJobError:
codec_execute:
sync_worker_exec:
sync_worker_exec_payload: LogicException: Got the response to undefined request 10389 in /srv/vendor/temporal/sdk/src/Internal/Transport/Client.php:60
and after it:
PanicError: unknown command CommandType: ChildWorkflow, ID: edfb1479-3d88-407e-a428-7e304e0d7bdf, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition

Environment/Versions
- Temporal Version: 1.16.2 and 1.2.0 SDK
- We use Kubernetes
Additional context
We tried to scale the pods so that can be split in different zones for fault tolerance. Maybe that causing these problems.
Hey @dmitry-pilipenko 👋🏻. I guess the problem is in the workers' restarts. SoftJobError
indicates some error that leads to process (PHP) restart. I expect that fix will be released the following week.
@rustatian Can I do something about it now? This happens in a production now :(
@dmitry-pilipenko, I'm not sure what is the initial reason for this error. Could you please turn on debug
in the logs and send me this file? Especially before and after this error.
@rustatian I hope this helps you: wf-default-6f974d785d-7rspn.log temporalio-history-85d87b8945-vks9v.log
@dmitry-pilipenko Thank you. Could you please update RR version? You use an unsupported version (v2.7.4
). You may try v2.10.2
.
@dmitry-pilipenko Thank you. Could you please update RR version? You use an unsupported version (
v2.7.4
). You may tryv2.10.2
.
@rustatian RR updates helped me. Your quick response helped me a lot. Thank you for this!
@rustatian this problem is still observed, but now it is in cases unknown to me.
Versions: RR - 2.10.2 Temporal Version: 1.16.2 and 1.3.2 SDK
Probably due to the fact that we use a wait with a timeout and then throw a custom exception. Example:
yield Temporal::awaitWithTimeout(
$interval = CarbonInterval::minutes(30),
fn () => $this->answer !== null
);
if ($this->answer === null) {
throw new ReplyTimeout($interval);
}
Trace:
PanicError: sync_worker_exec: SoftJobError:
sync_worker_exec_payload: LogicException: Got the response to undefined request 12445 in /srv/vendor/temporal/sdk/src/Internal/Transport/Client.php:60
Stack trace:
#0 /srv/vendor/temporal/sdk/src/WorkerFactory.php(389): Temporal\Internal\Transport\Client->dispatch()
#1 /srv/vendor/temporal/sdk/src/WorkerFactory.php(261): Temporal\WorkerFactory->dispatch()
#2 /srv/src/Infrastructure/CLI/TemporalWorker.php(67): Temporal\WorkerFactory->run()
#3 /srv/vendor/symfony/console/Command/Command.php(308): App\Infrastructure\CLI\TemporalWorker->execute()
#4 /srv/vendor/symfony/console/Application.php(989): Symfony\Component\Console\Command\Command->run()
#5 /srv/vendor/symfony/console/Application.php(299): Symfony\Component\Console\Application->doRunCommand()
#6 /srv/vendor/symfony/console/Application.php(171): Symfony\Component\Console\Application->doRun()
#7 /srv/vendor/helpcrunch/foundation/src/Runtime/Handler.php(29): Symfony\Component\Console\Application->run()
#8 /srv/vendor/helpcrunch/foundation/src/Runtime/Runner.php(34): Helpcrunch\Foundation\Runtime\Handler->__invoke()
#9 /srv/vendor/autoload_runtime.php(29): Helpcrunch\Foundation\Runtime\Runner->run()
#10 /srv/bin/app(11): require('...')
#11 {main}
process event for default [panic]:
github.com/temporalio/roadrunner-temporal/aggregatedpool.(*Workflow).OnWorkflowTaskStarted(0xc0007a7b30, 0xc00065ba08?)
github.com/temporalio/[email protected]/aggregatedpool/workflow.go:153 +0x2e8
go.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc000835c98, 0xc001a4db80, 0x0?, 0x1)
go.temporal.io/[email protected]/internal/internal_event_handlers.go:815 +0x203
go.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc00079f960, 0xc001b1df50)
go.temporal.io/[email protected]/internal/internal_task_handlers.go:878 +0xca8
go.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc0006c0210, 0xc001b1df50, 0xc000572300)
go.temporal.io/[email protected]/internal/internal_task_handlers.go:727 +0x485
go.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0001131e0, 0xc001b1df50)
go.temporal.io/[email protected]/internal/internal_task_pollers.go:284 +0x2cd
go.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0001131e0, {0x15e0ae0?, 0xc001b1df50?})
go.temporal.io/[email protected]/internal/internal_task_pollers.go:255 +0x6c
go.temporal.io/sdk/internal.(*baseWorker).processTask(0xc000170500, {0x15e06a0?, 0xc0007b8e40})
go.temporal.io/[email protected]/internal/internal_worker_base.go:398 +0x167
created by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher
go.temporal.io/[email protected]/internal/internal_worker_base.go:302 +0xb5
Log file: wf-default-5794999ddc-r4h94.log
Did you update RR to v2.10.4 ?
yes, I have updated my last comment and added more details
yes, I have updated my last comment and added more details
You didn't update RR, because, according to the stack trace, you are using the temporal plugin version 1.4.1, but in the 2.10.4 it was updated to v1.4.7.
@rustatian my current RR version is 2.10.2. If I update to 2.10.4 will my problem go away?
@rustatian my current RR version is 2.10.2. If I update to 2.10.4 will my problem go away?
Yes, we fixed this problem in the latest version.
@dmitry-pilipenko Please, attach the entire sample (link to your repo is preferable) in the description to reproduce your issue.
like this one for example: https://github.com/Torrion/temporal-worker-pool-leak-test
@rustatian, I can provide a child workflow that has problems: https://github.com/helpcrunch/temporal/blob/main/workflows.php
@rustatian, I can provide a child workflow that has problems: https://github.com/helpcrunch/temporal/blob/main/workflows.php
Please, remove the not needed parts from your code and provide a minimal example to run it with rr, as I showed in the sample. The minimal example should reproduce the bug and contain all code to run it (your .rr.yaml
should also be included).
@dmitry-pilipenko Friendly ping 😃
@rustatian I haven't been able to reproduce yet. Our workflows generated dynamically from user settings and I'm trying to find what is causing this problem. It hasn't been done locally yet. One of the main differences: locally I deployed using docker-compose, and in production using a k8s.
The problem appears only for flows where we waiting signal with a timeout. Do you have any hypotheses that might help?
yield Temporal::awaitWithTimeout(
$interval = CarbonInterval::minutes(30),
fn () => $this->answer !== null
);
if ($this->answer === null) {
throw new ReplyTimeout($interval);
}
It's execute in a child workflow. After that parent workflow got error:
{
"eventTime": "2022-07-01T10:17:00.000Z",
"eventType": "WorkflowTaskStarted",
"eventId": "26",
"details": {
"scheduledEventId": "25",
"identity": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8",
"requestId": "3bb264c4-76ac-4962-a6be-8d128650c38c",
"eventId": "26",
"eventType": "WorkflowTaskStarted",
"kvps": [
{
"key": "eventTime",
"value": "Jul 1st 1:17:00 pm"
},
{
"key": "eventId",
"value": "26"
},
{
"key": "scheduledEventId",
"value": "25"
},
{
"key": "identity",
"value": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8"
},
{
"key": "requestId",
"value": "3bb264c4-76ac-4962-a6be-8d128650c38c"
}
],
"eventTime": "Jul 1st 1:17:00 pm"
},
"eventTimeDisplay": "Jul 1st 1:17:00 pm",
"timeElapsedDisplay": "15s",
"eventSummary": {
"requestId": "3bb264c4-76ac-4962-a6be-8d128650c38c",
"eventId": "26",
"eventType": "WorkflowTaskStarted",
"kvps": [
{
"key": "requestId",
"value": "3bb264c4-76ac-4962-a6be-8d128650c38c"
}
]
},
"eventFullDetails": {
"scheduledEventId": "25",
"identity": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8",
"requestId": "3bb264c4-76ac-4962-a6be-8d128650c38c",
"eventId": "26",
"eventType": "WorkflowTaskStarted",
"kvps": [
{
"key": "scheduledEventId",
"value": "25"
},
{
"key": "identity",
"value": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8"
},
{
"key": "requestId",
"value": "3bb264c4-76ac-4962-a6be-8d128650c38c"
}
]
}
},
{
"eventTime": "2022-07-01T10:17:00.000Z",
"eventType": "WorkflowTaskFailed",
"eventId": "27",
"details": {
"scheduledEventId": "25",
"startedEventId": "26",
"cause": "WORKFLOW_TASK_FAILED_CAUSE_NON_DETERMINISTIC_ERROR",
"failure": {
"message": "unknown command CommandType: ChildWorkflow, ID: 31ab4b00-a18a-44e8-851a-baf9de182600, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition",
"source": "GoSDK",
"stackTrace": "process event for default [panic]:\ngo.temporal.io/sdk/internal.panicIllegalState(...)\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:409\ngo.temporal.io/sdk/internal.(*commandsHelper).getCommand(0x8?, {0x3?, {0xc000bc2d50?, 0x0?}})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:881 +0x109\ngo.temporal.io/sdk/internal.(*commandsHelper).handleStartChildWorkflowExecutionInitiated(0x7f7cc1a01f18?, {0xc000bc2d50?, 0xc000196000?})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:1124 +0x29\ngo.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc001373770, 0xc00156aa00, 0xd8?, 0x0)\n\tgo.temporal.io/[email protected]/internal/internal_event_handlers.go:905 +0x6ae\ngo.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc0007f9080, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:902 +0xd68\ngo.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc000a40c60, 0xc000559680, 0xc000702510)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:749 +0x485\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0005b3a00, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:284 +0x2cd\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0005b3a00, {0x16063c0?, 0xc000559680?})\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:255 +0x6c\ngo.temporal.io/sdk/internal.(*baseWorker).processTask(0xc00067e8c0, {0x1605f80?, 0xc00047f9c0})\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:400 +0x167\ncreated by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:305 +0xb5",
"cause": null,
"applicationFailureInfo": {
"type": "PanicError",
"nonRetryable": true,
"details": null
},
"failureInfo": "applicationFailureInfo"
},
"identity": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8",
"baseRunId": "",
"newRunId": "",
"forkEventVersion": "0",
"binaryChecksum": "23bc61c98cc56611c0691d0c4fd23834",
"eventId": "27",
"eventType": "WorkflowTaskFailed",
"kvps": [
{
"key": "eventTime",
"value": "Jul 1st 1:17:00 pm"
},
{
"key": "eventId",
"value": "27"
},
{
"key": "scheduledEventId",
"value": "25"
},
{
"key": "startedEventId",
"value": "26"
},
{
"key": "cause",
"value": "WORKFLOW_TASK_FAILED_CAUSE_NON_DETERMINISTIC_ERROR"
},
{
"key": "failure",
"value": "PanicError: unknown command CommandType: ChildWorkflow, ID: 31ab4b00-a18a-44e8-851a-baf9de182600, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition \nprocess event for default [panic]:\ngo.temporal.io/sdk/internal.panicIllegalState(...)\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:409\ngo.temporal.io/sdk/internal.(*commandsHelper).getCommand(0x8?, {0x3?, {0xc000bc2d50?, 0x0?}})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:881 +0x109\ngo.temporal.io/sdk/internal.(*commandsHelper).handleStartChildWorkflowExecutionInitiated(0x7f7cc1a01f18?, {0xc000bc2d50?, 0xc000196000?})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:1124 +0x29\ngo.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc001373770, 0xc00156aa00, 0xd8?, 0x0)\n\tgo.temporal.io/[email protected]/internal/internal_event_handlers.go:905 +0x6ae\ngo.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc0007f9080, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:902 +0xd68\ngo.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc000a40c60, 0xc000559680, 0xc000702510)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:749 +0x485\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0005b3a00, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:284 +0x2cd\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0005b3a00, {0x16063c0?, 0xc000559680?})\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:255 +0x6c\ngo.temporal.io/sdk/internal.(*baseWorker).processTask(0xc00067e8c0, {0x1605f80?, 0xc00047f9c0})\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:400 +0x167\ncreated by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:305 +0xb5"
},
{
"key": "identity",
"value": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8"
},
{
"key": "baseRunId",
"value": ""
},
{
"key": "newRunId",
"value": ""
},
{
"key": "forkEventVersion",
"value": "0"
},
{
"key": "binaryChecksum",
"value": "23bc61c98cc56611c0691d0c4fd23834"
}
],
"eventTime": "Jul 1st 1:17:00 pm"
},
"eventTimeDisplay": "Jul 1st 1:17:00 pm",
"timeElapsedDisplay": "15s",
"eventSummary": {
"message": "unknown command CommandType: ChildWorkflow, ID: 31ab4b00-a18a-44e8-851a-baf9de182600, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition",
"eventId": "27",
"eventType": "WorkflowTaskFailed",
"kvps": [
{
"key": "message",
"value": "unknown command CommandType: ChildWorkflow, ID: 31ab4b00-a18a-44e8-851a-baf9de182600, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition"
}
]
},
"eventFullDetails": {
"scheduledEventId": "25",
"startedEventId": "26",
"cause": "WORKFLOW_TASK_FAILED_CAUSE_NON_DETERMINISTIC_ERROR",
"failure": {
"message": "unknown command CommandType: ChildWorkflow, ID: 31ab4b00-a18a-44e8-851a-baf9de182600, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition",
"source": "GoSDK",
"stackTrace": "process event for default [panic]:\ngo.temporal.io/sdk/internal.panicIllegalState(...)\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:409\ngo.temporal.io/sdk/internal.(*commandsHelper).getCommand(0x8?, {0x3?, {0xc000bc2d50?, 0x0?}})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:881 +0x109\ngo.temporal.io/sdk/internal.(*commandsHelper).handleStartChildWorkflowExecutionInitiated(0x7f7cc1a01f18?, {0xc000bc2d50?, 0xc000196000?})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:1124 +0x29\ngo.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc001373770, 0xc00156aa00, 0xd8?, 0x0)\n\tgo.temporal.io/[email protected]/internal/internal_event_handlers.go:905 +0x6ae\ngo.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc0007f9080, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:902 +0xd68\ngo.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc000a40c60, 0xc000559680, 0xc000702510)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:749 +0x485\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0005b3a00, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:284 +0x2cd\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0005b3a00, {0x16063c0?, 0xc000559680?})\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:255 +0x6c\ngo.temporal.io/sdk/internal.(*baseWorker).processTask(0xc00067e8c0, {0x1605f80?, 0xc00047f9c0})\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:400 +0x167\ncreated by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:305 +0xb5",
"cause": null,
"applicationFailureInfo": {
"type": "PanicError",
"nonRetryable": true,
"details": null
},
"failureInfo": "applicationFailureInfo"
},
"identity": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8",
"baseRunId": "",
"newRunId": "",
"forkEventVersion": "0",
"binaryChecksum": "23bc61c98cc56611c0691d0c4fd23834",
"eventId": "27",
"eventType": "WorkflowTaskFailed",
"kvps": [
{
"key": "scheduledEventId",
"value": "25"
},
{
"key": "startedEventId",
"value": "26"
},
{
"key": "cause",
"value": "WORKFLOW_TASK_FAILED_CAUSE_NON_DETERMINISTIC_ERROR"
},
{
"key": "failure",
"value": "PanicError: unknown command CommandType: ChildWorkflow, ID: 31ab4b00-a18a-44e8-851a-baf9de182600, possible causes are nondeterministic workflow definition code or incompatible change in the workflow definition \nprocess event for default [panic]:\ngo.temporal.io/sdk/internal.panicIllegalState(...)\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:409\ngo.temporal.io/sdk/internal.(*commandsHelper).getCommand(0x8?, {0x3?, {0xc000bc2d50?, 0x0?}})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:881 +0x109\ngo.temporal.io/sdk/internal.(*commandsHelper).handleStartChildWorkflowExecutionInitiated(0x7f7cc1a01f18?, {0xc000bc2d50?, 0xc000196000?})\n\tgo.temporal.io/[email protected]/internal/internal_decision_state_machine.go:1124 +0x29\ngo.temporal.io/sdk/internal.(*workflowExecutionEventHandlerImpl).ProcessEvent(0xc001373770, 0xc00156aa00, 0xd8?, 0x0)\n\tgo.temporal.io/[email protected]/internal/internal_event_handlers.go:905 +0x6ae\ngo.temporal.io/sdk/internal.(*workflowExecutionContextImpl).ProcessWorkflowTask(0xc0007f9080, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:902 +0xd68\ngo.temporal.io/sdk/internal.(*workflowTaskHandlerImpl).ProcessWorkflowTask(0xc000a40c60, 0xc000559680, 0xc000702510)\n\tgo.temporal.io/[email protected]/internal/internal_task_handlers.go:749 +0x485\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).processWorkflowTask(0xc0005b3a00, 0xc000559680)\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:284 +0x2cd\ngo.temporal.io/sdk/internal.(*workflowTaskPoller).ProcessTask(0xc0005b3a00, {0x16063c0?, 0xc000559680?})\n\tgo.temporal.io/[email protected]/internal/internal_task_pollers.go:255 +0x6c\ngo.temporal.io/sdk/internal.(*baseWorker).processTask(0xc00067e8c0, {0x1605f80?, 0xc00047f9c0})\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:400 +0x167\ncreated by go.temporal.io/sdk/internal.(*baseWorker).runTaskDispatcher\n\tgo.temporal.io/[email protected]/internal/internal_worker_base.go:305 +0xb5"
},
{
"key": "identity",
"value": "default:3bfa8009-a174-424b-a20c-2eb5f01c93e8"
},
{
"key": "baseRunId",
"value": ""
},
{
"key": "newRunId",
"value": ""
},
{
"key": "forkEventVersion",
"value": "0"
},
{
"key": "binaryChecksum",
"value": "23bc61c98cc56611c0691d0c4fd23834"
}
]
}
}
@dmitry-pilipenko 👋🏻 Might be you use a different RR version locally and in the k8s? We had this issue in our past versions.
@dmitry-pilipenko 👋🏻 Might be you use a different RR version locally and in the k8s? We had this issue in our past versions.
@rustatian versions are completely identical.
k8s:
docker-compose:
@rustatian Now I found a case when there was no awaiting with a timeout in the flow, but it still cause the problem. I exported the workflow logs from the admin: c0818c95 9669 4de5 ab5e 855e2de2f2d8 - e121f3e0-3df8-4e25-b0c0-d0e5de289955.json.zip
@dmitry-pilipenko Thanks for the logs, but to help you, we need to reproduce this issue. Please, as I suggested earlier, create a repository with a reproducible sample that includes .rr.yaml
and sample minimum app. It can either be in Docker or run with rr serve
.
IDK if this maybe of help, but I've experienced the same issue twice, both times our pods were lacking available memory. IDK how this happens and if it would be the same to you.
IDK if this maybe of help, but I've experienced the same issue twice, both times our pods were lacking available memory. IDK how this happens and if it would be the same to you.
Do you have a supervisor in the RR's configuration?
Nope, we probably should, but after increasing memory everything is stable, handling 4+Million workflows a day for half a year now. I'll enable it when I get the chance.
It's not that PHP or RR was leaking memory, we falsely set the memory too low constraining the pod.
Nope, we probably should, but after increasing memory everything is stable, handling 4+Million workflows a day for half a year now. I'll enable it when I get the chance.
wow, that's big numbers 😮
Might be OOM kill the workflow worker?
Yeah, it should kill the whole pod cause of OOM, but it gets into a weird state before that with undefined request
. I haven't investigated it enough to reproduce it :(
Ok, thanks. Please keep us updated; if we can reproduce this weird issue, we will fix it ASAP.
I'll try to reproduce it with OOM when I get some free time :D :pray: