rten icon indicating copy to clipboard operation
rten copied to clipboard

Share implementations for operators based on data type width

Open robertknight opened this issue 1 year ago • 1 comments

Instantiating copies of all operators for all supported data types increases the binary size and compile time of the library. For example I'm currently prototyping adding f16 support, and a naive implementation increased the size of the rten CLI tool by 300 KB / 11%.

For operators which merely move or copy data, such as Transpose, we only need different code based on the size of elements. A single instantiation could handle i32, u32 and f32 for example.

Operators which could use this optimization include:

  • Operators which rely only on elements implementing Copy. This includes all the layout operations.
  • Bitwise operations
  • Sign for certain combinations of types where the sign bit is in the same place
  • Maybe operations which care about bitwise equality with zero. For floats this is complicated by +0.0 vs -0.0.
  • Maybe operations which care about bitwise equality of values. For floats this is complicated by +0.0 vs -0.0, NaN etc.

robertknight avatar Jun 21 '24 08:06 robertknight

Maybe operations which care about bitwise equality with zero. For floats this is complicated by +0.0 vs -0.0.

I suppose we could implement eg. NonZero for a given bit width with shared code using a function which receives two different zero values as arguments. When used on types which only have a single zero value, these two values would be the same.

robertknight avatar Jun 21 '24 08:06 robertknight