prefect icon indicating copy to clipboard operation
prefect copied to clipboard

`ECS Worker` flow run crashes on `RunTask` cause of `Some tags contain invalid characters`

Open povisenko opened this issue 9 months ago • 4 comments

First check

  • [X] I added a descriptive title to this issue.
  • [X] I used the GitHub search to find a similar issue and didn't find it.
  • [X] I searched the Prefect documentation for this issue.
  • [X] I checked that this issue is related to Prefect and not one of its dependencies.

Bug summary

I am hosting prefect server (2.18.1) with ECS worker

After I created a deployment on the server using .from_source(Git).deploy() pointing it simple hello_world.py, I tried to run it and flow run crashes.

Reproduction

* Prefect Server is up and running
* ECS worker is connected (FARGATE launch mode)
* Create deployment, that take flow from Git source and deploy it
* Run it
* Flow run crashes

Error

Failed to submit flow run '9b42cf44-adca-4748-a6f4-39755d1b7a83' to infrastructure.
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/prefect/workers/base.py", line 904, in _submit_run_and_capture_errors
    result = await self.run(
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/workers/ecs_worker.py", line 639, in run
    ) = await run_sync_in_worker_thread(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect/utilities/asyncutils.py", line 132, in run_sync_in_worker_thread
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/workers/ecs_worker.py", line 751, in _create_task_and_wait_for_start
    self._report_task_run_creation_failure(configuration, task_run_request, exc)
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/workers/ecs_worker.py", line 747, in _create_task_and_wait_for_start
    task = self._create_task_run(ecs_client, task_run_request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 289, in wrapped_f
    return self(f, *args, **kw)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 379, in __call__
    do = self.iter(retry_state=retry_state)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 325, in iter
    raise retry_exc.reraise()
          ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 158, in reraise
    raise self.last_attempt.result()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/local/lib/python3.12/site-packages/tenacity/__init__.py", line 382, in __call__
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/prefect_aws/workers/ecs_worker.py", line 1632, in _create_task_run
    task = ecs_client.run_task(**task_run_request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation: Some tags contain invalid characters. Valid characters: UTF-8 letters, spaces, numbers and _ . / = + - : @.

Versions

ECS worker is run based on `prefecthq/prefect:2.18-python3.12`

["/bin/sh","-c","pip install prefect-aws==0.4.16 && prefect worker start --pool ecs-pool --type ecs"]


### Additional context

I did not modify any job variables

Interesting thing. It's self hosted solution which I deployed to other env a few weeks ago and it all run well  then.

Now I deployed it anew and at first attempt to run a flow it asked me to give new permission to the ECS Worker `ecs:TagResource`. I did it and after I started to see that `Some tags contain invalid characters` 

povisenko avatar Apr 30 '24 18:04 povisenko

Thanks @povisenko - can you look in the ECS console and let me know if there are any jobs created with tags?

WillRaphaelson avatar Apr 30 '24 18:04 WillRaphaelson

@WillRaphaelson not sure if understood you but when I attempt to run the flow from deployment I expect to see new ECS Service to appear with Launch type FARGATE

none of that is seen, seem it fails before that

yet it creates new revision of task definition prefect_ecs-pool_476287aa-208d-475f-abfe-3cd96b3b4985:16

{
    "taskDefinitionArn": "arn:aws:ecs:eu-west-2:746670261417:task-definition/prefect_ecs-pool_476287aa-208d-475f-abfe-3cd96b3b4985:16",
    "containerDefinitions": [
        {
            "name": "prefect",
            "image": "prefecthq/prefect:2.18.1-python3.12",
            "cpu": 0,
            "portMappings": [],
            "essential": true,
            "environment": [],
            "mountPoints": [],
            "volumesFrom": [],
            "systemControls": []
        }
    ],
    "family": "prefect_ecs-pool_476287aa-208d-475f-abfe-3cd96b3b4985",
    "networkMode": "awsvpc",
    "revision": 16,
    "volumes": [],
    "status": "ACTIVE",
    "requiresAttributes": [
        {
            "name": "com.amazonaws.ecs.capability.docker-remote-api.1.18"
        },
        {
            "name": "ecs.capability.task-eni"
        }
    ],
    "placementConstraints": [],
    "compatibilities": [
        "EC2",
        "FARGATE"
    ],
    "requiresCompatibilities": [
        "FARGATE"
    ],
    "cpu": "1024",
    "memory": "2048",
    "registeredAt": "2024-04-30T18:53:54.269Z",
    "registeredBy": "arn:aws:sts::746670261417:assumed-role/prefect_ecs_task_role/5dce7359826a4aa38cdb0fbc9b979674",
    "tags": []
}

povisenko avatar Apr 30 '24 18:04 povisenko

@WillRaphaelson shortly speaking it crashes on RunTask

botocore.errorfactory.InvalidParameterException: An error occurred (InvalidParameterException) when calling the RunTask operation

povisenko avatar Apr 30 '24 19:04 povisenko

@povisenko Can you set the logging level environment variable PREFECT_LOGGING_LEVEL="DEBUG" where your worker is running? This'll log the constructed task run request, which might help us track down where the invalid characters are.

kevingrismore avatar Apr 30 '24 21:04 kevingrismore

@kevingrismore I found it! Thanks

here is tags

    {
      "key": "prefect.io/flow-run-id",
      "value": "e35d894e-5096-4071-adf4-16bbad3dd375"
    },
    {
      "key": "prefect.io/flow-run-name",
      "value": "lime-dragonfly"
    },
    {
      "key": "prefect.io/version",
      "value": "2.18.1"
    },
    {
      "key": "prefect.io/deployment-id",
      "value": "476287aa-208d-475f-abfe-3cd96b3b4985"
    },
    {
      "key": "prefect.io/deployment-name",
      "value": "Hello World Deployment -Test"
    },
    {
      "key": "prefect.io/deployment-updated",
      "value": "2024-04-30T18:08:56.545156Z"
    },
    {
      "key": "prefect.io/flow-id",
      "value": "0375372b-3dce-4921-a027-baab7d2cbd92"
    },
    {
      "key": "prefect.io/flow-name",
      "value": "Hello, World"
    }
  ]

the issue was with @flow(name="Hello, World")

I got rid of , and it run

I think it might be needed to add a validation on Deployment API or what ever else

leaving the issue open

povisenko avatar May 01 '24 10:05 povisenko

hi @povisenko - thanks for the issue

this should be fixed by #13190 and released soon.

zzstoatzz avatar May 01 '24 17:05 zzstoatzz

@zzstoatzz thanks!

povisenko avatar May 01 '24 21:05 povisenko