chongxiaoc comments

Results 37 comments of


                                            chongxiaoc

add support for custom data loading (e.g. NVTabular) in KerasEstimator

It looks this PR needs to be rebased after https://github.com/horovod/horovod/pull/3665 is merged. `make_dataset_fn()` in keras is changed there.

add support for custom data loading (e.g. NVTabular) in KerasEstimator

@leewyang @EnricoMi I will take a look asap. Thanks.

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

As my (limited) understanding of using horovod with tensorflow: - The collective operations like allreduce, allgather and broadcast, are used for communication between ranks, other than being defined as TensorFlow...

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

@loic-ehrhardt Your fix is helpful, I will look at it.

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

Can you try this patch as well? It looks like the allreduce kernal is multiplying postscale_factor internally, having to convert it back when calculating local grad. Though I'm not sure...

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

> Thanks @chongxiaoc. > > Interesting, the `test_tensorflow.TensorFlowTests().test_horovod_allgather_grad_cpu()` fails... same for `test_horovod_broadcast_grad_cpu()`. > > Is this scaling really expected? When running the following in `horovod/test/parallel`: > > ``` > horovodrun...

chongxiaoc

add support for custom data loading (e.g. NVTabular) in KerasEstimator

add support for custom data loading (e.g. NVTabular) in KerasEstimator

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

Wrong scaling in the gradient of hvd.allreduce(), hvd.allgather(), hvd.broadcast()

KerasEstimator breaks for length of feature_cols > 1

KerasEstimator breaks for length of feature_cols > 1

Test PTL 1.6.3