dask-cloudprovider icon indicating copy to clipboard operation
dask-cloudprovider copied to clipboard

dask_cloudprovider.utils.timeout.TimeoutException: Failed to find scheduler ip address after 120 seconds

Open sergii-ivakhno-kidsloop opened this issue 2 years ago • 0 comments

I am getting timeout after launching Fargate cluster dask_cloudprovider.utils.timeout.TimeoutException: Failed to find scheduler ip address after 120 seconds. I know this error has been reported before, but in my case it doesn't seem to be related to timeout duration as I get the same error after waiting for 5min

Looking into the source code it seems that the issue is due to parsing and getting IP from logs at https://github.com/dask/dask-cloudprovider/blob/main/dask_cloudprovider/aws/ecs.py#L209 It tries to access events within logs but it is empty when I print it at this line https://github.com/dask/dask-cloudprovider/blob/main/dask_cloudprovider/aws/ecs.py#L379

{'events': [], 'nextForwardToken': 'f/36723491871038569079854395880864976970354492776673968128/s', 'nextBackwardToken': 'b/36723331305673139659367776823693413902151649717678768128/s', 'ResponseMetadata': {'RequestId': 'a4af94b5-b125-44c5-8b22-8455cf0c07e4', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'a4af94b5-b125-44c5-8b22-8455cf0c07e4', 'content-type': 'application/x-amz-json-1.1', 'content-length': '178', 'date': 'Tue, 08 Mar 2022 09:18:47 GMT'}, 'RetryAttempts': 0}}

What happened:

TimeoutException

What you expected to happen:

No failure

Minimal Complete Verifiable Example:

cluster = FargateCluster(
        image="prefecthq/prefect:latest",
        execution_role_arn="arn:aws:iam::**:role/dusk-cluster",
        task_role_policies = ["arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess"],
        platform_version='1.3.0',
        n_workers=3,
        scheduler_cpu=256,
        scheduler_mem=512,
        worker_cpu=256,
        worker_mem=512,
        scheduler_timeout="15 minutes",
        find_address_timeout=120
    )

Anything else we need to know?:

Environment:

  • Dask version: dask-cloudprovider-2022.1.0, dask-2022.2.1
  • Python version: Python 3.8.12
  • Operating System: Ubuntu 18.04.6 LTS
  • Install method (conda, pip, source): Poetry
Cluster Dump State:

sergii-ivakhno-kidsloop avatar Mar 08 '22 16:03 sergii-ivakhno-kidsloop