happy2048

Results 17 comments of happy2048

we have using https://github.com/AliyunContainerService/tf-operator/tree/v1.0-aliyun-branch to build tf-operator for arena, this branch includes some custom features different from kubeflow/tf-operator.

> Can we merge the change [AliyunContainerService/tf-operator@61b5919](https://github.com/AliyunContainerService/tf-operator/commit/61b59190c296c3cd6ad6ec12058d0eb27029c4de) to the mainstream? ok, I will commit a PR.

Please refer the doc: https://arena-docs.readthedocs.io/en/latest/installation/uninstall/ and if the 0.8.4 package is not found, execute the command: ``` $ git clone https://github.com/kubeflow/arena $ cd arena $ go build -o arena-uninstall cmd/uninstall/uninstall.go...

Hi, yuanbw, it may not be a bug of arena,please refer https://github.com/kubeflow/tf-operator/issues/267 and https://github.com/kubeflow/tf-operator/issues/238.

目前可以修改~/charts中serving chart中的serviceType实现,后期我们考虑从命令行传入service type。

Ok, I will test it and report the result.

@shivamerla I updated dcgm-exporter to 2.3.5-2.6.5-ubuntu20.04 and removed the env DCGM_REMOTE_HOSTENGINE_INFO to enable embeded mode and set the interval time of collectiing gpu metrics is 6000(generate metrics quickly). ![image](https://user-images.githubusercontent.com/18654706/166855675-9172ae05-5208-4773-bf26-581985b2858a.png) the...

@shivamerla https://github.com/NVIDIA/dcgm-exporter/blob/main/pkg/dcgmexporter/dcgm.go#L83, the maxKeepAge is set to 0.0 is correct? 0.0 means no limit?

@shivamerla Is there a conclusion?