Albert Zeyer comments

Results 1028 comments of


                                            Albert Zeyer

Group normalization documentation is incorrect

Yes, I have. Are you saying I'm wrong, or just double checking? The code for group-normalization is (with added comments, and prefix, for case G=1): ``` B, T, F =...

Group normalization documentation is incorrect

> So in the original comment do you meant `moments` on [T,F] with `G=1` vs moments on `[F]` with layer-normalizzation? Yes exactly. That's not the same. The behavior is very...

Group normalization documentation is incorrect

I think the GroupNorm implementation is fine. At least it seems to be what is commonly implemented. I think the LayerNorm implementation is also fine. This seems to be the...

Group normalization documentation is incorrect

Yes, right. If you use LN and configure it to normalize over all axes (except batch), then `gamma`/`beta` is wrong. Which is not even possible for my example, as T...

MultiProcDataset + Postprocessing = CPU overcommit?

> resulting in a CPU overcommit (because the number of assigned CPUs matches the number of data processes), adversely affecting performance. How do you know this is really negatively affecting...

MultiProcDataset + Postprocessing = CPU overcommit?

> Look at the load values. The job is assigned 48 cores by SLURM, but seems to produce a ~190 15min load average. But why is this bad? What I...

MultiProcDataset + Postprocessing = CPU overcommit?

I explained this already: When considering hyperthreading, and/or locality of data, I can imagine that multiple threads per each worker can anyway be beneficial (independent of how much other workers...

Hang in training (often with multi GPU training)

Sometimes also like this: ``` ... ep 28 train, step 56, ctc_4 2.616, ctc_8 2.268, ctc 2.221, num_seqs 8, max_size:time 278344, max_size:out-spatial 67, mem_usage:cuda:0 6.3GB, 0.658 sec/step ep 28 train,...

No registered 'Const' OpKernel for GPU devices with constant folding

I just ran the [Colab from above](https://colab.research.google.com/gist/tilakrayal/6996de1c84dbdf1431043370d1c9ea08/52200.ipynb) with recent TF 2.17.0, and the same error still occurs.

Gradient checkpointing for weight noise etc in PyTorch

I'm reading the code of `_checkpoint_without_reentrant_generator`. It looks like this uses a couple of techniques which are very relevant for what we need: There is logic for `preserve_rng_state`. It gets...