Jeremy Lewi

Results 203 comments of Jeremy Lewi
trafficstars

@danijar So basically you want to do data [parallel replicated training](https://www.tensorflow.org/deploy/distributed#replicated_training) with TensorFlow each of which is getting data from a set of environments running colocated with TF. I think...

> I think in many scenarios it makes sense to simulate and train on the same machine, and just scale the number of those machines. That's mainly because it seems...

@cwbeitel any interest in trying to turn @danijar's gist into an example running on K8s?

@danijar This sounds like standard [between-graph replication](https://www.tensorflow.org/deploy/distributed#replicated_training) training. Do we need to do anything special?

It really depends on how the code is setup. Running ops on the master("chief") is quite common. Typically worker 0 is the chief and is also running computations.

This seems like a useful feature. I investigated a bit and it looks like we'd want to modify the request handler template https://github.com/grpc-ecosystem/grpc-gateway/blob/9ec62387b4d04e454fcc84ab8f7d0d0c11dddde1/protoc-gen-grpc-gateway/internal/gengateway/template.go#L412 To test the oneof and return the...

@kunmingg could you elaborate on what the goal of this issue is and what work needs to be completed?

@vielmetti that sounds great. How can we help? My suggestion would probably to be take one of the examples e.g. [mnist](https://github.com/kubeflow/examples/tree/master/mnist) try to run it on Arm and just see...

@MrXinWang I would suggest talking to @jinchihe and looking at what we are doing for power #4133 we should try to follow the same approach for arm as we are...