Wang Zhang

Results 10 issues of Wang Zhang

**Describe the bug** When I was testing triton inference server 19.10, GPU memory usage increases when the following two functions are called: 1. cuCtxGetCurrent 2. cuModuleGetFunction It seems when loading...

**Is this a BUG REPORT or FEATURE REQUEST?**: /kind feature **Status**: So far FTLib does not support TensorFlow. When adopted in ElasticDL, we take a NumPy NDArray and wrapped it...

kind/feature

**Is this a BUG REPORT or FEATURE REQUEST?**: > Uncomment only one, leave it on its own line: > /kind bug **What happened**: This is an issue I encountered when...

kind/bug

**Is this a BUG REPORT or FEATURE REQUEST?**: /kind feature **What happened**: This might not be urgent at this moment. But before we let contributors help with more dl frameworks...

kind/feature

On master branch, while the tensorflow go has adopted version 1.11.0, in the Makefile the libtensorflow is still using version 1.8.0: https://github.com/oracle/graphpipe-go/blob/master/cmd/graphpipe-tf/Makefile#L151

The common repo offers a `controller.v1` package which is designed for low-level controller mode in kubebuilder. As we are working on a unified controller, it seems a `reconciler.v1` package will...

While the comment for `CleanPodPolicy` and `RestartPolicy` defines what these policies mean, there is no explanation for `CleanPodPolicyAll`, `CleanPodPolicyRunning`, `CleanPodPolicyNone`, `RestartPolicyAlways`, `RestartPolicyOnFailure` and `RestartPolicyNever`

#172 breaks training-operator when installing crd reproduce steps: 1. change go.mod and go.sum ```shell diff --git a/go.mod b/go.mod index 58f089d7..d491f115 100644 --- a/go.mod +++ b/go.mod @@ -5,7 +5,7 @@ go...

In https://github.com/kubeflow/common/pull/141, methods implemented as `panic("implement me!")` that are aimed to let developers overriding is removed as the pull request is merged. However, when creating a demo reconciler or even...

1. remove duplicated repos for tf-operator, pytorch-operator and mpi-operator 2. add kubeflow/traininig-operator repo to support kubeflow training on kubernetes 1.22+