Faster data unpacking (Diffferent to #1291)
Here is a faster implementation of the data unpacking (src/finn/util/data_packing.py::packed_bytearray_to_finnpy). This implementation is different to the one seen in #1291.
While being very efficient mdanilows variant suffers from weaknesses, such at not supporting SIMD>1 and not supporting some data types such as fixed and floating point.
This PR addresses these problems and the performance problems of the current implementation by adding a unique unpacking for different datatype categories.
Furthermore, I removed the inferring of the output_shape, as it is ambiguous: E.g. a Byte can store 1 to 4 UINT2 numbers.
I ran a test comparing the current implementation to my variant:
For the first test I assumed an input of shape (10, 32, 32, 8, 1) of different datatype, which is packed into a byte array. Then I unpacked it with both variants. Here are the speedups:
For the second test I assumed an input of shape (10, 8, 1). One can see that the speedup is decreasing, but still is substantial: