dkron icon indicating copy to clipboard operation
dkron copied to clipboard

Job is not retried after failure caused by grpc error

Open mrhackcz opened this issue 1 month ago • 0 comments

Describe the bug A job with retries > 0 that failed by this error is not retried.

grpc: Error on execution streaming, agent connection was abruptly terminated: rpc error: code = Internal desc = grpc: error while marshaling: marshaling types.AgentRunStream: size mismatch (see https://github.com/golang/protobuf/issues/1609): calculated=219, measured=128

To Reproduce Steps to reproduce the behavior:

  1. Create a job with retries > 0
  2. Wait, because I don't know why it's happening and how to reproduce it

Expected behavior The job should be retried when it fails.

Screenshots

- Job specification:
vader@deathstar:~/work$ curl -s http://127.0.0.1:8080/v1/jobs/failedjob| jq .
{
  "id": "failedjob",
  "name": "failedjob",
  "displayname": "failedjob",
  "timezone": "Europe/Prague",
  "schedule": "0 1 * * * *",
  "owner": "admin.cz",
  "owner_email": "",
  "success_count": 523,
  "error_count": 8,
  "last_success": "2025-11-11T10:01:00.208788688Z",
  "last_error": "2025-11-11T10:01:00.477874671Z",
  "disabled": false,
  "tags": {
    "dc": "dc1:1"
  },
  "metadata": {
    "app": "admin",
    "country": "cz",
    "project": "project"
  },
  "retries": 1,
  "dependent_jobs": null,
  "parent_job": "",
  "processors": {},
  "concurrency": "forbid",
  "executor": "shell",
  "executor_config": {
    "command": "php8.1 bin/console command",
    "cwd": "/var/www/project/current"
  },
  "status": "failed",
  "next": "2025-11-11T11:01:00Z",
  "ephemeral": false,
  "expires_at": null
}

- Last execution:
vader@deathstar:~/work$ curl -s 'http://127.0.0.1:8080/v1/jobs/failedjob/executions?_end=25&_order=DESC&_sort=id&_start=0&jobs=failedjob&output_size_limit=2000' | jq .[0]
{
  "id": "1762855260013775820-server.cz",
  "job_name": "failedjob",
  "started_at": "2025-11-11T11:01:00.01377582+01:00",
  "finished_at": "2025-11-11T11:01:00.477874671+01:00",
  "success": false,
  "output": "grpc: Error on execution streaming, agent connection was abruptly terminated: rpc error: code = Internal desc = grpc: error while marshaling: marshaling types.AgentRunStream: size mismatch (see https://github.com/golang/protobuf/issues/1609): calculated=219, measured=128",
  "node_name": "server.cz",
  "group": 1762855260004896357,
  "attempt": 1,
  "output_truncated": true
}

Specifications:

  • OS: Ubuntu 22
  • Version: 4.0.8
  • Executor: shell
  • 3 node cluster

Additional context Add any other context about the problem here.

mrhackcz avatar Nov 11 '25 10:11 mrhackcz