flyte icon indicating copy to clipboard operation
flyte copied to clipboard

[BUG] Getting started guide, can't run on local cluster

Open tamis-laan opened this issue 3 years ago • 8 comments

Describe the bug

I'm trying out Flyte locally running through the getting started guide: https://docs.flyte.org/en/latest/getting_started/index.html

The code runs great using pyflyte but doesn't work properly when running on the local demo cluster.

running:

> pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2

results in the error:

{"asctime": "2022-09-15 13:06:21,069", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-15T13:06:21.069160907+02:00\", grpc_status:14}\"\n>, sleeping 200ms and retrying"}
{"asctime": "2022-09-15 13:06:21,270", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-15T13:06:21.269916921+02:00\", grpc_status:14}\"\n>, sleeping 400ms and retrying"}
Traceback (most recent call last):
  File "/home/tux/.local/bin/pyflyte", line 8, in <module>
    sys.exit(main())
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 539, in _run
    remote_entity = remote.register_script(
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/remote/remote.py", line 596, in register_script
    upload_location, md5_bytes = fast_register_single_script(
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 113, in fast_register_single_script
    upload_location = create_upload_location_fn(content_md5=md5)
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
    return super(SynchronousFlyteClient, self).create_upload_location(
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler
    return fn(*args, **kwargs)
  File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/raw.py", line 854, in create_upload_location
    return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
  File "/home/tux/.local/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/tux/.local/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data"
	debug_error_string = "UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:"2022-09-15T13:06:21.670827636+02:00", grpc_status:14}"

Expected behavior

The example code should run on the local cluster I created using:

> flytectl demo start

Additional context to reproduce

No response

Screenshots

No response

Are you sure this issue hasn't been raised already?

  • [X] Yes

Have you read the Code of Conduct?

  • [X] Yes

tamis-laan avatar Sep 15 '22 11:09 tamis-laan

Thank you for opening your first issue here! 🛠

welcome[bot] avatar Sep 15 '22 11:09 welcome[bot]

We are suspecting that it's an initialization issue. We'll investigate and comment here with what we discover.

eapolinario avatar Sep 16 '22 17:09 eapolinario

@eapolinario This is standing in the way of evaluating Flyte for our use case, is there a working version I can revert to?

tamis-laan avatar Sep 18 '22 10:09 tamis-laan

@tamis-laan , first of all, sorry for your trouble. Can you say a bit more about how you're running this? The initial suspicion I had turned out to not be true, so I'm very interested in knowing what's happening in your case.

For example, after you run flytectl demo start are you able to open http://localhost:30080/console? Can you also double-check the existence of the config file mentioned after flytectl demo start finishes? (In other words, you should be seeing a file called ~/.flyte/config.yaml.

eapolinario avatar Sep 20 '22 00:09 eapolinario

@tamis-laan , also worth checking if you're setting the environment vars mentioned after flytectl demo start finishes. For example:

❯ flytectl demo start --source .
...
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-p52mz | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-724hd                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-ln8s8                      | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
👨‍💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/eduardo/.kube/config:/home/eduardo/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/eduardo/.flyte/config-sandbox.yaml

eapolinario avatar Sep 20 '22 18:09 eapolinario

@tamis-laan , first of all, sorry for your trouble. Can you say a bit more about how you're running this? The initial suspicion I had turned out to not be true, so I'm very interested in knowing what's happening in your case.

For example, after you run flytectl demo start are you able to open http://localhost:30080/console? Can you also double-check the existence of the config file mentioned after flytectl demo start finishes? (In other words, you should be seeing a file called ~/.flyte/config.yaml.

The console is reachable but I don't have a ~/.flyte/config.yaml I do have a ~/.flyte/config-sandbox.yaml:

   admin:
     # For GRPC endpoints you might want to use dns:///flyte.myexample.com
     endpoint: localhost:30081
     authType: Pkce
     insecure: true
   logger:
     show-source: true
     level: 0

tamis-laan avatar Sep 21 '22 08:09 tamis-laan

@tamis-laan , also worth checking if you're setting the environment vars mentioned after flytectl demo start finishes. For example:

❯ flytectl demo start --source .
...
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-p52mz | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-724hd                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-ln8s8                      | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
👨‍💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/eduardo/.kube/config:/home/eduardo/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/eduardo/.flyte/config-sandbox.yaml
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-znws5 | Pending       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-9qtmx                      | Pending       | flyte     |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-47rzb                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
+---------------------------------------------+---------------+-----------+
|                   SERVICE                   |    STATUS     | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-47rzb                    | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-znws5 | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-9qtmx                      | Running       | flyte     |
+---------------------------------------------+---------------+-----------+
👨‍💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/tux/.kube/config:/home/tux/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/tux/.flyte/config-sandbox.yaml

I have set both environment variables:

> echo $KUBECONFIG
/home/tux/.kube/config:/home/tux/.flyte/k3s/k3s.yaml
> echo $FLYTECTL_CONFIG
/home/tux/.flyte/config-sandbox.yaml

Still I het the same error:

{"asctime": "2022-09-21 10:10:10,341", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-21T10:10:10.341471522+02:00\", grpc_status:14}\"\n>, sleeping 200ms and retrying"}
{"asctime": "2022-09-21 10:10:10,542", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-21T10:10:10.542086665+02:00\", grpc_status:14}\"\n>, sleeping 400ms and retrying"}
Traceback (most recent call last):
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/bin/pyflyte", line 8, in <module>
    sys.exit(main())
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 542, in _run
    remote_entity = remote.register_script(
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/remote/remote.py", line 600, in register_script
    upload_location, md5_bytes = fast_register_single_script(
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 111, in fast_register_single_script
    upload_location = create_upload_location_fn(content_md5=md5)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
    return super(SynchronousFlyteClient, self).create_upload_location(
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler
    return fn(*args, **kwargs)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/raw.py", line 854, in create_upload_location
    return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.UNAVAILABLE
	details = "DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data"
	debug_error_string = "UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:"2022-09-21T10:10:10.942963692+02:00", grpc_status:14}"

I have tested this on 3 linux machines myself. And a colleague has tested this on windows and has the same result.

tamis-laan avatar Sep 21 '22 08:09 tamis-laan

@eapolinario

Is there a way to revert to a working version?

tamis-laan avatar Sep 22 '22 07:09 tamis-laan

We sync'd offline on this.

The tldr is that the default DNS resolver used by the python grpc client (C-ares according to the docs) is unable to resolve the name localhost. Forcing the client to use the OS's native dns resolver (by setting the environment variable GRPC_DNS_RESOLVER=native) unblocks the issue, although it's still unclear why c-ares was failing.

This issue in the grpc repo suggests that we collect logs by increasing the verbosity and tracing certain components.

eapolinario avatar Oct 05 '22 03:10 eapolinario

Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏

github-actions[bot] avatar Sep 04 '23 00:09 github-actions[bot]

Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏

github-actions[bot] avatar Sep 12 '23 01:09 github-actions[bot]