Steve Lu comments

Results 4 comments of


Steve Lu

tf1.6.0 not working, and did not get good NLL oracle result.

Hi, The two issues may both be due to the implementation of tf.CuDNNLSTM. The reported performance is based on my manually implemented LSTM (as is implemented in generator.py, the older...

I think the Bootstrapped Rescaled Activation in the code is not useful

The sorting operation is hidden behind the the implementation of the Python ``map'' object. As it is implemented by a binary search tree, which automatically sort the inserted elements, the...

RuntimeError: SRU_Compute_GPULegacyBackward is not differentiable twice

Yes, but they are of little help with this problem. The problem is that WGAN-GP has a term in its loss function which is the norm of the original model's...

[BUG]

I've encountered this bug too. After inspection, it feels to me the following implementation in the current stable release is related: ```python #In deepspeed/runtime/zero/stage_1_and_2.py def complete_grad_norm_calculation_for_cpu_offload(self, params): total_norm = 0.0...