MEllis-github

Results 9 comments of MEllis-github

Note, this may be resolved by https://github.com/hpcaitech/ColossalAI/pull/3266 depending on the final version of the PR

Same error for macOS running ``` pip install -U pymemgpt memgpt quickstart --backend memgpt ``` Not triggered with ``` pip install -e . memgpt quickstart --backend memgpt ``` Git branch:...

If a complete solution from MCAD is not possible, one partial solution supporting external automation for handling this problem would be MCAD emitting a warning or error (and/or AppWrapper state...

Excellent distinction, and idea, thank you! Yes, we are deriving the label from the job name, but this is also not changed between re-queuings (or not currently if your proposal...

* I agree that a comprehensive solution also depends on the behavior of the respective controller. The training-operator behavior is of concern for PyTorchJob instances. Jobs are also submitted to...

Building on this [idea](https://github.com/project-codeflare/multi-cluster-app-dispatcher/issues/599#issuecomment-1688653900) and this [concept](https://github.com/kubeflow/common/pull/139), does MCAD internally represent and track the expectation that a pod deleted while "rewrapping" and requeuing an AppWrapper should be gone before the...

From discussion, the consensus is to address this in MCAD by introducing support for monitoring resource deletion i.e. MCAD will not only issue deletion of resources as it does now,...

Is [this](https://github.com/project-codeflare/multi-cluster-app-dispatcher/issues/262#issuecomment-1412395279) proposal for MCAD's helm charts or for MCAD-external users/tools?

Multiple methods are used for submitting appwrappers at this point in time e.g. the codeflare CLI tool, torchX+MCAD, custom helm charts and scripts, in addition to direct manifest editing and...