Jeremy Lewi comments

Results 203 comments of


                                            Jeremy Lewi

trafficstars

Distributed training with Kubernetes

@danijar So basically you want to do data [parallel replicated training](https://www.tensorflow.org/deploy/distributed#replicated_training) with TensorFlow each of which is getting data from a set of environments running colocated with TF. I think...

Distributed training with Kubernetes

> I think in many scenarios it makes sense to simulate and train on the same machine, and just scale the number of those machines. That's mainly because it seems...

Distributed training with Kubernetes

@cwbeitel any interest in trying to turn @danijar's gist into an example running on K8s?

Distributed training with Kubernetes

@danijar This sounds like standard [between-graph replication](https://www.tensorflow.org/deploy/distributed#replicated_training) training. Do we need to do anything special?

Distributed training with Kubernetes

It really depends on how the code is setup. Running ops on the master("chief") is quite common. Typically worker 0 is the chief and is also running computations.

ephemeralStorage changes ephemeral-storage for init container but not kaniko container

@gsquared94 Thank you very much.

Error when using oneof name in response body selector

This seems like a useful feature. I investigated a bit and it looks like we'd want to modify the request handler template https://github.com/grpc-ecosystem/grpc-gateway/blob/9ec62387b4d04e454fcc84ab8f7d0d0c11dddde1/protoc-gen-grpc-gateway/internal/gengateway/template.go#L412 To test the oneof and return the...

Jeremy Lewi

Distributed training with Kubernetes

Distributed training with Kubernetes

Distributed training with Kubernetes

Distributed training with Kubernetes

Distributed training with Kubernetes

ephemeralStorage changes ephemeral-storage for init container but not kaniko container

Error when using oneof name in response body selector

k8s multi tenancy group support

Kubeflow on arm64

Kubeflow on arm64