Fiona Waters

Results 7 issues of Fiona Waters

Skipping the MCAD CPU Preemption Test which is failing intermittently on PRs so that we can get some outstanding PRs merged.

# Issue link Closes #613 # What changes have been made Added error handling and logging. This issue has not been fully completed as we will now be working on...

While installing mcad on an OSD 4.13 cluster the following error was logged `queuejobagent.go:64] [agentEventQueue] Invalid agent configuration: null. Agent cluster will not be instantiated.` Investigate the cause of this...

# Issue link [RHOAIENG-6475](https://issues.redhat.com/browse/RHOAIENG-6475) # What changes have been made When running GPU utilising workloads using the basic_interactive demo notebook I encountered a lot of errors while running the training...

## WHY The current notebook image used in the custom-nb-image uses packages that are no longer supported. ## WHAT Update the base image [here](https://github.com/project-codeflare/codeflare-sdk/blob/main/custom-nb-image/Dockerfile#L15) to reflect the update in this...

## WHY To tidy up and reduce time taken to run unit tests action. ## WHAT Cache poetry dependencies in github workflow unit-tests.yml file ## HOW Update the unit-tests.yml file...

**Description of your changes:** This PR will resolve https://github.com/kubeflow/training-operator/issues/2068 I have updated the pytorch launcher component to use v2 constructs. I have also updated the pytorch launcher component to use...

needs-ok-to-test
size/L