Yuan Tang

Results 625 comments of Yuan Tang

cc @Jeffwan @gaocegege @johnugeorge @ywskycn @merlintang WDYT? Any objections on bringing this type of errors to Kubeflow CR level? It would be convenient to surface this at CR level status...

Let's use https://github.com/kubeflow/training-operator/issues/1507 to track and discuss separately.

Sounds great to me. This would be a good way to standardize metrics collection. We could also expose some utility methods that operators can use to collect operator-specific custom metrics,...

Hi all, I added a detailed outline of the Prometheus metrics we plan to coverage in common operator in https://github.com/kubeflow/common/pull/77. Please take a look and any feedback would be appreciated.

Agreed. Having a unified interface would make it easier for downstream apps to consume the logs.

cc @ShuhanYan @carmark in case you are interested

Progress are being tracked in individual repos: - MXNet Operator https://github.com/kubeflow/mxnet-operator/issues/66 - MPI Operator https://github.com/kubeflow/mpi-operator/issues/217 - XGBoost Operator https://github.com/kubeflow/xgboost-operator/issues/44

Also here are some good references on criteria, processes, and past exits: https://github.com/kubeflow/community/blob/master/guidelines/application_requirements.md#reference