bhack

Results 1416 comments of bhack

In the case we are really not interested in the python source code at all for compilation errors reporting (but I am really not sure about this point) probably we...

/cc @georgepaw if he is interested in the design as Graphcore has an API for this in: https://github.com/graphcore/tensorflow/blob/r2.5/sdk-release-2.5/tensorflow/python/ipu/optimizers/gradient_accumulation_optimizer.py

> To this day, there are no reliable benchmarks for "real-world models" across frameworks (Keras, PyTorch, JAX/Flax). A "real-world model" is the kind of model that is actually produced and...

The files are extracted correctly but please pay attention to the output dir: https://github.com/keras-team/keras/blob/8c401c032b3021f89609eac79bd1c881b9bbc84f/keras/utils/data_utils.py#L169-L172

Is the underline it uses TF reduce_mean: https://www.tensorflow.org/api_docs/python/tf/math/reduce_mean Where the `axis=None` semantic means "all the axis". So what do you mean with `avoid reducing over any axis`?

> Sarvagya can you comment here? (pinged him a link as I can't tag him yet) It was just misspelled. > YoloX sums over the axis internally and then means...

Yes also in the official reference impl the reduction is more in the BinaryCrossentropy API more then in mean: https://github.com/Megvii-BaseDetection/YOLOX/blob/main/yolox/models/yolo_head.py#L493-L495

Have you checked https://github.com/tensorflow/tensorflow/issues/48845?

@tanzhenyu Please check if can reprdouce with `drop-reminder=true` in 'tf.dataset` batch formation

It was two months ago so I don't remember exactly the analysis I've done. Can you debug/print the specific batch size near or on `Nan` step (of course without introducing...