Xinyuan Huang

Results 46 issues of Xinyuan Huang

This adds a template runner, which serves as a template for custom runners. The template runner is provided for convenience of new runner development only, it cannot be used as...

This adds a simple base runner that provides common runner functionalities. This is part of runner standardization / runner spec implementation #66. depends on #77 #88 closes #76

This is caused by using "successCondition" in the Argo step to track the status of the created kubeflow resources (tfjob), which causes the step to timeout in a few minutes...

priority/p2

we need a monitor function in the kubebench controller image that polls status of deployed kubeflow jobs until the desired status (success/fail/etc.) is met. the monitor will be run in...

priority/p1

The supports for TFJob, PyTorchJob and MPIJob are not complete in the new v1alpha2 codes. We need to add back these support. Required changes for each framework: - Add manifest...

The following changes are made in v1alpha2 design: - Remove `secretSpecs` field, the secret configs can be handled by user directly by mounting to workflow agent or workload resources through...

We want to support inputs/outputs in the tasks so that data can be transposed from/to external data sources (e.g. object storage). The inputs/outputs fields should be added under the task...

With the introduction of the new Kubebench CRD, there have been major changes to the way that Kubebench is being used, the design doc and user guide need to be...

The tf-cnn benchmark example needs to be reworked to fit in v1alpha2.