dkron
dkron copied to clipboard
Job is not retried after failure caused by grpc error
Describe the bug A job with retries > 0 that failed by this error is not retried.
grpc: Error on execution streaming, agent connection was abruptly terminated: rpc error: code = Internal desc = grpc: error while marshaling: marshaling types.AgentRunStream: size mismatch (see https://github.com/golang/protobuf/issues/1609): calculated=219, measured=128
To Reproduce Steps to reproduce the behavior:
- Create a job with retries > 0
- Wait, because I don't know why it's happening and how to reproduce it
Expected behavior The job should be retried when it fails.
Screenshots
- Job specification:
vader@deathstar:~/work$ curl -s http://127.0.0.1:8080/v1/jobs/failedjob| jq .
{
"id": "failedjob",
"name": "failedjob",
"displayname": "failedjob",
"timezone": "Europe/Prague",
"schedule": "0 1 * * * *",
"owner": "admin.cz",
"owner_email": "",
"success_count": 523,
"error_count": 8,
"last_success": "2025-11-11T10:01:00.208788688Z",
"last_error": "2025-11-11T10:01:00.477874671Z",
"disabled": false,
"tags": {
"dc": "dc1:1"
},
"metadata": {
"app": "admin",
"country": "cz",
"project": "project"
},
"retries": 1,
"dependent_jobs": null,
"parent_job": "",
"processors": {},
"concurrency": "forbid",
"executor": "shell",
"executor_config": {
"command": "php8.1 bin/console command",
"cwd": "/var/www/project/current"
},
"status": "failed",
"next": "2025-11-11T11:01:00Z",
"ephemeral": false,
"expires_at": null
}
- Last execution:
vader@deathstar:~/work$ curl -s 'http://127.0.0.1:8080/v1/jobs/failedjob/executions?_end=25&_order=DESC&_sort=id&_start=0&jobs=failedjob&output_size_limit=2000' | jq .[0]
{
"id": "1762855260013775820-server.cz",
"job_name": "failedjob",
"started_at": "2025-11-11T11:01:00.01377582+01:00",
"finished_at": "2025-11-11T11:01:00.477874671+01:00",
"success": false,
"output": "grpc: Error on execution streaming, agent connection was abruptly terminated: rpc error: code = Internal desc = grpc: error while marshaling: marshaling types.AgentRunStream: size mismatch (see https://github.com/golang/protobuf/issues/1609): calculated=219, measured=128",
"node_name": "server.cz",
"group": 1762855260004896357,
"attempt": 1,
"output_truncated": true
}
Specifications:
- OS: Ubuntu 22
- Version: 4.0.8
- Executor: shell
- 3 node cluster
Additional context Add any other context about the problem here.