Albert Zeyer comments

Results 1028 comments of


                                            Albert Zeyer

Metrics reporting

We do that already. Timings are collected for various things. E.g. also computation time is measured (so you see whether dataset is a bottleneck). For (CPU) memory, there is `watch_memory`,...

`Dataset` API should support arbitrary data formats; should use `Tensor`/`TensorDict`

@NeoLegends One first step would be to remove all code usages of `num_outputs`/`num_inputs` and replace by `is_data_sparse`, `get_data_shape`, `get_data_dim`, etc. Also, while at that, we should fix datasets which require...

task eval broken for PyTorch engine

> Is the eval task intended to be used with PyTorch? There is no reason why it should not be. But we have to see whether it is really easily...

RF weight dropout and variational noise

Btw, regarding gradient checkpointing, see this current code as an example for variational noise in our TF code: ```python if param_variational_noise and param.dtype.is_floating and isinstance(param, tf.Variable): with default_control_flow_ctx(): # make...

RF weight dropout and variational noise

> There is a gradient checkpointing API in PT: https://pytorch.org/docs/stable/checkpoint.html Yea that is what I referred to when we talked about it. But I need to check it more how...

RF weight dropout and variational noise

> Yeah it would seem to me like applying only the dropout operation within the gradient checkpointed context might not be enough, but one would have to move more of...

RF weight dropout and variational noise

(Note, I made a separate issue just for the gradient checkpointing aspect in PyTorch: #1552. So this issue here can just focus on the RF specific question on how to...

RF weight dropout and variational noise

So, I tend to reimplement something very similar as the PyTorch parametrization API, and also following some of the internal design choices. * I don't want to extend `rf.Module`. It's...

RF weight dropout and variational noise

I also thought about deriving or extending `rf.Parameter`. I'm not exactly sure how though. It is currently also a `Tensor`, and I don't think we can make this dynamically evaluate...

RF upcast to f32 for certain ops (layernorm, attention, etc) (bf16/f16)

I just realized that [Torch AMP already automatically upcasts to f32 for certain ops](https://docs.pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float16). That includes, among others: `layer_norm`, `log`, `log_softmax`, `softmax`, `exp`, `sum`, `nll_loss`, `rsqrt`, `norm`, etc. So, take...