kubeflow-manifests icon indicating copy to clipboard operation
kubeflow-manifests copied to clipboard

ML end to end example

Open goswamig opened this issue 2 years ago • 0 comments

We plan to add a documentation on large scale distributed training (along with tensorboard integration ) and finally model serving.

Many users might want to learn how to get started with large scale distributed training with GPU instances.

It would be worth having an example of training like mask-rcnn or bert on kubeflow with aws service integration.

We can extend the example into two sub examples.

  1. Parameter server based
  2. Horovod based

goswamig avatar Apr 18 '22 19:04 goswamig