Abin Shahab

Results 8 comments of Abin Shahab

@EnricoMi thanks for catching this. The issue is that the default value of placement_group_timeout_s is not being applied. I'll try to take a look this week.

@n-balla Tensorflow Keras has callbacks that would allow you to access the current step number. If you are implementing a custom loop then each worker will have access to the...

@tanmoyio , By databricks, you mean you are running the jobs(pytorch? Tensorflow?) inside spark? Can you explain how you are doing inference?

@yundai424 I am wondering if it's related to the other callbacks on that [example](https://github.com/horovod/horovod/blob/master/examples/tensorflow2/tensorflow2_keras_mnist.py#L74) that allreduce at epoch boundaries. Can you try removing those callbacks to narrow the problem down?

Actually I do have time next week if you are fine waiting. This is an awesome project, I'd like to contribute. On Thu, Mar 21, 2019, 11:35 PM Ce Gao...

Can you elaborate on the following: "Define the controller logic in the controllers directory. This is where you will download the protobuf file, parse its contents into a Ray DAG,...

What would the reconciliation loop of the controller do?

Can you implement the reconciliation loop in golang?