Seung Jin

Results 36 comments of Seung Jin

Another case to consider: In multi-instance cluster scenario, we want different pods belonging to the same cluster to be scheduled on different nodes if possible.

Looks pretty good to me! I'll approve once the tests are passing (which it should after a rebase) and the use of `reload` instead of `restart` is considered and decided...

For personal reference, how `--grace-period=0 --force` translates into python API: https://github.com/kubernetes-client/python/issues/508

It _seems_ like a nonzero `grace_period` might not act like `--force` based on [here](https://github.com/kubernetes-client/python/issues/508#issuecomment-1695759777) and [here](https://github.com/kubernetes/kubectl/blob/826006cdb947f80a679ff1eb3cb53f183a6a9bf2/pkg/cmd/delete/delete.go#L285-L286) - is there a reason 10 seconds was chosen as the grace period?

Yes! I've assigned you this issue, feel free to give a go at it.

> nice, like it. do we have any test coverage specifically on the labels/nodeselector code? No on unit tests because the codepaths here does need a k8s cluster to interact...

Re: UI, I do agree on having a table showing aggregated GPU availability across all clusters. I actually think such table should be at the top, because the current UI...

I actually think that if a node doesn't contain any GPU then it shouldn't show up on the table

Tried reproducing this with: ``` file_mounts: /cloudflare: name: source: ~/yamls store: r2 mode: MOUNT ``` This command actually errors out for me with `upload failed: ../../yamls/cloudflare.yaml to s3:///cloudflare.yaml An error...