mxnet-operator
mxnet-operator copied to clipboard
A Kubernetes operator for mxnet jobs
``` ➜ mxnet-operator git:(master) go build -o defaulter-gen /Users/jiaxin/go/pkg/mod/k8s.io/[email protected]/cmd/defaulter-gen -> 1.16.10 work ➜ mxnet-operator git:(master) ✗ go build -o defaulter-gen /Users/jiaxin/go/pkg/mod/k8s.io/[email protected]/cmd/defaulter-gen go: directory ../../../../pkg/mod/k8s.io/[email protected]/cmd/defaulter-gen outside available modules -> 1.19.x doesn't...
Inspired by https://github.com/kubeflow/pipelines/issues/4682 I created a script that will create a config file for depandabot so that it knows what directories to scan. It will scan the repository for files...
There are python sdks for tf-operator and pytorch-operator, why doesn't mxnet-operator have one?
Community is asking different WG to own their infra and community won't provide a common shared testing infra anymore. Sees kubeflow/testing#752 for more details. Pytorch migration works well and here's...
Currently, mxnet operator API doesn't follow `kubeflow/common` convention and but controller does. It imports tf-operator implementation `https://github.com/kubeflow/mxnet-operator/blob/c718707b304dc1ed0210a740c8efe7071d4ebb3e/pkg/controller.v1/mxnet/controller.go#L42` To graduate mxnet-operator to v1, we'd like to migrate to follow `kubecom/common` convention....
I create this parent ticket to track all the changes to graduate mxnet-operator to v1. ### Configuration and deployment Description | Category | Status | Issue -- | -- |...
Hello, Thank you for working on mxnet-operator for kubeflow. I see that mxnet-operator is not installed by default in Kubeflow. If it is available by default, it would help users...
**kubeflow version**: 0.5.0 **mxnet-operator version**: v1beta1 **kubernetes dashboard display**:  **worker-0 log**: INFO:root:start with arguments Namespace(add_stn=False, batch_size=64, data_dir='/admin/public/model/mxnet_distributed/data', disp_batches=10, dtype='float32', gc_threshold=0.5, gc_type='none', gpus='0', image_shape='1, 28, 28', initializer='default', kv_store='dist_device_sync', load_epoch=None, loss='',...
There are couple of minor api changes that are suggested. We can incorporate all these changes in the next API version. Related: kubeflow/tf-operator#935 - [x] Requires support of Status subresource...