Albert Zeyer

Results 1028 comments of Albert Zeyer

We do that already. Timings are collected for various things. E.g. also computation time is measured (so you see whether dataset is a bottleneck). For (CPU) memory, there is `watch_memory`,...

@NeoLegends One first step would be to remove all code usages of `num_outputs`/`num_inputs` and replace by `is_data_sparse`, `get_data_shape`, `get_data_dim`, etc. Also, while at that, we should fix datasets which require...

> Is the eval task intended to be used with PyTorch? There is no reason why it should not be. But we have to see whether it is really easily...

Btw, regarding gradient checkpointing, see this current code as an example for variational noise in our TF code: ```python if param_variational_noise and param.dtype.is_floating and isinstance(param, tf.Variable): with default_control_flow_ctx(): # make...

> There is a gradient checkpointing API in PT: https://pytorch.org/docs/stable/checkpoint.html Yea that is what I referred to when we talked about it. But I need to check it more how...

> Yeah it would seem to me like applying only the dropout operation within the gradient checkpointed context might not be enough, but one would have to move more of...

(Note, I made a separate issue just for the gradient checkpointing aspect in PyTorch: #1552. So this issue here can just focus on the RF specific question on how to...

So, I tend to reimplement something very similar as the PyTorch parametrization API, and also following some of the internal design choices. * I don't want to extend `rf.Module`. It's...

I also thought about deriving or extending `rf.Parameter`. I'm not exactly sure how though. It is currently also a `Tensor`, and I don't think we can make this dynamically evaluate...

I just realized that [Torch AMP already automatically upcasts to f32 for certain ops](https://docs.pytorch.org/docs/stable/amp.html#cuda-ops-that-can-autocast-to-float16). That includes, among others: `layer_norm`, `log`, `log_softmax`, `softmax`, `exp`, `sum`, `nll_loss`, `rsqrt`, `norm`, etc. So, take...