dask-kubernetes
dask-kubernetes copied to clipboard
Native Kubernetes integration for Dask
We are currently running dask on kubernetes. At the moment we are scaling the number of workers using metrics from the scheduler prometheus which return you the information about the...
Previously, if the service type was a NodePort, KubeCluster would attempt to discover the IP address by listing nodes. This is a set of permissions that not all users will...
This library manages pods directly via the Kubernetes API. This is a design decision we made for [many reasons](https://github.com/dask/dask-kubernetes/issues/168#issuecomment-517210364). People often ask why we didn't use a different resource type,...
**Describe the issue**: I'm consistently getting "kubectl port forward failed" from `port_forward_service` using the standard `KubeCluster` class to create an ad hoc cluster. When this happens the port forward is...
Would it be possible to track the status of a job in the toplevel `DaskJob` CR? This would have the advantage of hiding the "implementation details" of the job from...
Installing the operator means applying 4 manifests (5 with #451). _Source: https://kubernetes.dask.org/en/latest/operator_installation.html_ Given that we are [automatically building the CRDs from templates](https://github.com/dask/dask-kubernetes/blob/main/ci/pre-commit-crd.py) we could extend this to finally concatenate all...
is it possible to change the service name having issues trying to connect my workers and other containers to the scheduler using dask-scheduler:8786 ? the dask-scheduler:8786 value is hardcoded in...
The CI is taking nearly 30 minutes at the moment, experimenting with running tests in parallel to bring that time down. Initial timings: | Suite | Time (approx) | |...
**What happened**: KubeCluster times out when creating a cluster with NodePort service because it's looking for scheduler at 8786 when port is actually a randomized port (e.g. 32367). Note, the...
Many tests start the kopf controller via the `kopf_runner` fixture and perform work within the context manager that it provides. ```python def test_foo(kopf_runner): # Start the controller with kopf_runner: #...