[BUG] Getting started guide, can't run on local cluster
Describe the bug
I'm trying out Flyte locally running through the getting started guide: https://docs.flyte.org/en/latest/getting_started/index.html
The code runs great using pyflyte but doesn't work properly when running on the local demo cluster.
running:
> pyflyte run --remote example.py wf --n 500 --mean 42 --sigma 2
results in the error:
{"asctime": "2022-09-15 13:06:21,069", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-15T13:06:21.069160907+02:00\", grpc_status:14}\"\n>, sleeping 200ms and retrying"}
{"asctime": "2022-09-15 13:06:21,270", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-15T13:06:21.269916921+02:00\", grpc_status:14}\"\n>, sleeping 400ms and retrying"}
Traceback (most recent call last):
File "/home/tux/.local/bin/pyflyte", line 8, in <module>
sys.exit(main())
File "/usr/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 539, in _run
remote_entity = remote.register_script(
File "/home/tux/.local/lib/python3.10/site-packages/flytekit/remote/remote.py", line 596, in register_script
upload_location, md5_bytes = fast_register_single_script(
File "/home/tux/.local/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 113, in fast_register_single_script
upload_location = create_upload_location_fn(content_md5=md5)
File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
return super(SynchronousFlyteClient, self).create_upload_location(
File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler
return fn(*args, **kwargs)
File "/home/tux/.local/lib/python3.10/site-packages/flytekit/clients/raw.py", line 854, in create_upload_location
return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
File "/home/tux/.local/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/tux/.local/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data"
debug_error_string = "UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:"2022-09-15T13:06:21.670827636+02:00", grpc_status:14}"
Expected behavior
The example code should run on the local cluster I created using:
> flytectl demo start
Additional context to reproduce
No response
Screenshots
No response
Are you sure this issue hasn't been raised already?
- [X] Yes
Have you read the Code of Conduct?
- [X] Yes
Thank you for opening your first issue here! 🛠
We are suspecting that it's an initialization issue. We'll investigate and comment here with what we discover.
@eapolinario This is standing in the way of evaluating Flyte for our use case, is there a working version I can revert to?
@tamis-laan , first of all, sorry for your trouble. Can you say a bit more about how you're running this? The initial suspicion I had turned out to not be true, so I'm very interested in knowing what's happening in your case.
For example, after you run flytectl demo start are you able to open http://localhost:30080/console? Can you also double-check the existence of the config file mentioned after flytectl demo start finishes? (In other words, you should be seeing a file called ~/.flyte/config.yaml.
@tamis-laan , also worth checking if you're setting the environment vars mentioned after flytectl demo start finishes. For example:
❯ flytectl demo start --source .
...
+---------------------------------------------+---------------+-----------+
| SERVICE | STATUS | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-p52mz | Running | flyte |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-724hd | Running | flyte |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-ln8s8 | Running | flyte |
+---------------------------------------------+---------------+-----------+
👨💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/eduardo/.kube/config:/home/eduardo/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/eduardo/.flyte/config-sandbox.yaml
@tamis-laan , first of all, sorry for your trouble. Can you say a bit more about how you're running this? The initial suspicion I had turned out to not be true, so I'm very interested in knowing what's happening in your case.
For example, after you run
flytectl demo startare you able to open http://localhost:30080/console? Can you also double-check the existence of the config file mentioned afterflytectl demo startfinishes? (In other words, you should be seeing a file called~/.flyte/config.yaml.
The console is reachable but I don't have a ~/.flyte/config.yaml I do have a ~/.flyte/config-sandbox.yaml:
admin:
# For GRPC endpoints you might want to use dns:///flyte.myexample.com
endpoint: localhost:30081
authType: Pkce
insecure: true
logger:
show-source: true
level: 0
@tamis-laan , also worth checking if you're setting the environment vars mentioned after
flytectl demo startfinishes. For example:❯ flytectl demo start --source . ... +---------------------------------------------+---------------+-----------+ | SERVICE | STATUS | NAMESPACE | +---------------------------------------------+---------------+-----------+ | flyte-kubernetes-dashboard-7fd989b99d-p52mz | Running | flyte | +---------------------------------------------+---------------+-----------+ | postgres-bdb75f779-724hd | Running | flyte | +---------------------------------------------+---------------+-----------+ | minio-55b8c8f4bc-ln8s8 | Running | flyte | +---------------------------------------------+---------------+-----------+ 👨💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉 Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable export KUBECONFIG=$KUBECONFIG:/home/eduardo/.kube/config:/home/eduardo/.flyte/k3s/k3s.yaml export FLYTECTL_CONFIG=/home/eduardo/.flyte/config-sandbox.yaml
+---------------------------------------------+---------------+-----------+
| SERVICE | STATUS | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-znws5 | Pending | flyte |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-9qtmx | Pending | flyte |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-47rzb | Running | flyte |
+---------------------------------------------+---------------+-----------+
+---------------------------------------------+---------------+-----------+
| SERVICE | STATUS | NAMESPACE |
+---------------------------------------------+---------------+-----------+
| postgres-bdb75f779-47rzb | Running | flyte |
+---------------------------------------------+---------------+-----------+
| flyte-kubernetes-dashboard-7fd989b99d-znws5 | Running | flyte |
+---------------------------------------------+---------------+-----------+
| minio-55b8c8f4bc-9qtmx | Running | flyte |
+---------------------------------------------+---------------+-----------+
👨💻 Flyte is ready! Flyte UI is available at http://localhost:30080/console 🚀 🚀 🎉
Add KUBECONFIG and FLYTECTL_CONFIG to your environment variable
export KUBECONFIG=$KUBECONFIG:/home/tux/.kube/config:/home/tux/.flyte/k3s/k3s.yaml
export FLYTECTL_CONFIG=/home/tux/.flyte/config-sandbox.yaml
I have set both environment variables:
> echo $KUBECONFIG
/home/tux/.kube/config:/home/tux/.flyte/k3s/k3s.yaml
> echo $FLYTECTL_CONFIG
/home/tux/.flyte/config-sandbox.yaml
Still I het the same error:
{"asctime": "2022-09-21 10:10:10,341", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-21T10:10:10.341471522+02:00\", grpc_status:14}\"\n>, sleeping 200ms and retrying"}
{"asctime": "2022-09-21 10:10:10,542", "name": "flytekit.cli", "levelname": "ERROR", "message": "Non-auth RPC error <_InactiveRpcError of RPC that terminated with:\n\tstatus = StatusCode.UNAVAILABLE\n\tdetails = \"DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data\"\n\tdebug_error_string = \"UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:\"2022-09-21T10:10:10.542086665+02:00\", grpc_status:14}\"\n>, sleeping 400ms and retrying"}
Traceback (most recent call last):
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/bin/pyflyte", line 8, in <module>
sys.exit(main())
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clis/sdk_in_container/run.py", line 542, in _run
remote_entity = remote.register_script(
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/remote/remote.py", line 600, in register_script
upload_location, md5_bytes = fast_register_single_script(
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/tools/script_mode.py", line 111, in fast_register_single_script
upload_location = create_upload_location_fn(content_md5=md5)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/friendly.py", line 998, in get_upload_signed_url
return super(SynchronousFlyteClient, self).create_upload_location(
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/raw.py", line 41, in handler
return fn(*args, **kwargs)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/flytekit/clients/raw.py", line 854, in create_upload_location
return self._dataproxy_stub.CreateUploadLocation(create_upload_location_request, metadata=self._metadata)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/tux/.local/share/virtualenvs/getting-started-ZamZP3ef/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data"
debug_error_string = "UNKNOWN:DNS resolution failed for localhost:30081: C-ares status is not ARES_SUCCESS qtype=A name=localhost is_balancer=0: DNS server returned answer with no data {created_time:"2022-09-21T10:10:10.942963692+02:00", grpc_status:14}"
I have tested this on 3 linux machines myself. And a colleague has tested this on windows and has the same result.
@eapolinario
Is there a way to revert to a working version?
We sync'd offline on this.
The tldr is that the default DNS resolver used by the python grpc client (C-ares according to the docs) is unable to resolve the name localhost. Forcing the client to use the OS's native dns resolver (by setting the environment variable GRPC_DNS_RESOLVER=native) unblocks the issue, although it's still unclear why c-ares was failing.
This issue in the grpc repo suggests that we collect logs by increasing the verbosity and tracing certain components.
Hello 👋, This issue has been inactive for over 9 months. To help maintain a clean and focused backlog, we'll be marking this issue as stale and will close the issue if we detect no activity in the next 7 days. Thank you for your contribution and understanding! 🙏
Hello 👋, This issue has been inactive for over 9 months and hasn't received any updates since it was marked as stale. We'll be closing this issue for now, but if you believe this issue is still relevant, please feel free to reopen it. Thank you for your contribution and understanding! 🙏