Daniel Rasmussen

Results 86 comments of Daniel Rasmussen

Just checked, and it produces the same error as before. Here are the reproduction steps (I updated the installation instructions to match the changes for TF 2.12 here https://www.tensorflow.org/install/pip#linux): ```...

Yes, that makes the problem go away, although I would hesitate to call it a solution as that's quite a cumbersome process to repeat every time we create a new...

As an additional update, I believe that this bug is triggered whenever applying the `Adam` optimizer in a distributed context (I haven't done an exhaustive search over the optimizers, that's...

With a bit more investigation I figured out that what's going on is that `evaluate` is only reporting the loss from the first replica, and ignoring the rest. Here's an...

One more piece of investigation. I believe the above issue with `evaluate` is mainly a display issue. The model is computing the loss value correctly in each replica, but only...

I think I was able to get past this issue, but then I run into this bug https://github.com/keras-team/keras/issues/19246 so I can't really tell if things are working correctly or not.

No, the issue is not resolved. I had been working on a fix locally, but was unable to verify it due to that other bug. But this issue itself is...

This issue will still require a pull request (or two) of its own to fix, it definitely won't be resolved on its own after #19246 is fixed.

I believe it's actually two separate issues (both requiring fixes). One is the wrong value being returned from evaluate. The other is that the gradient aggregation is not happening, so...

I haven't had a chance to dig into it more. I believe there was an attempt to fix this here https://github.com/keras-team/keras/pull/19969, but then that was reverted so I'm not sure...