returnn Better arbitrary dtype support, e.g. bfloat16

Better arbitrary dtype support, e.g. bfloat16

Open albertz opened this issue 3 years ago • 1 comments

trafficstars

In principle, RETURNN supports arbitrary dtype, as Data can just have any dtype. However, many layers do not really allow to configure that. Most layers would just take the same as the input, so in principle, you could also put some CastLayer whenever needed. But then there are also a couple of places where float32 is hardcoded.

How should this be configured? Maybe there should be a global option default_float_type or so. (Also similar as PyTorch.) And otherwise this can be an option of a layer. But maybe only when relevant. I.e. if the user could just explicitly put a cast layer in front, and it would then all work, then I think it is not really needed. You can already have bfloat16 in extern_data and it might all work already.

So, all hardcoded float32 should be removed, either by a layer option, or otherwise using this global option as fallback.

And then, I guess this needs some testing. Maybe directly on TPU (see #1162).

Oct 20 '22 07:10 albertz

Note, for PyTorch, we have exactly that now, called default_float_dtype. It's probably very simple to adapt for any RF usage (also including TF).

Nov 06 '24 22:11 albertz

returnn returnn copied to clipboard

Better arbitrary dtype support, e.g. bfloat16

returnn
returnn copied to clipboard