Curtis Vogt
Curtis Vogt
Something else to note: completed/terminated pods are eventually cleaned up due to [pod garbage collection](https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-garbage-collection) but that isn't something we can make use of here
Copying relevant logs here as GHA logs don't persist: ``` [ Info: Waiting for test-multi-addprocs job. This could take up to 4 minutes... Error from server (NotFound): pods "test-multi-addprocs-st7tc" not...
Appears the manager job was terminated and removed before debugging information could be rendered. Probably means we want to adjust some TTL settings so this can be debugged further
Note that this is the only cluster test to fail and this is also the only cluster test to make use of `--sort-by`. I suspect what is going on is...
Seems to be introduced with #48
I like the concept of dynamically adding workers into a pool but unfortunately we're restricted by how Distributed.jl works at the moment. Attempting to do this with the existing Distributed.jl...
> Are you sure we can't dynamically add workers to a pool? Here's the code that calls the `launch` method defined by the Distributed interface: https://github.com/JuliaLang/julia/blob/c2b4b382c11b5668cb9091138b1fa9178c47bff5/stdlib/Distributed/src/cluster.jl#L480-L499 You're expected to add...
I may have thought of a workaround to this problem. If we define an alternative `addprocs` function, maybe `spawn`, what we could do is internally is call `addprocs` asynchronously adding...
Note that you can use https://github.com/beacon-biosignals/K8sClusterManagers.jl#advanced-configuration to handle this currently. I agree that there are ways to make this user story better though.
Overall I agree with the premise that it would be good to be able to support a Revise-based workflow without having to fall back on local Distributed processes. However, I...