spire icon indicating copy to clipboard operation
spire copied to clipboard

IEEE 754 half-precision data type

Open bashimao opened this issue 10 years ago • 21 comments

I know this is not an easy task. But probably a rewarding one for many applications that require a high memory bandwidth. I really would like to have half-precision float values in spire.

Description of IEEE 754 binary16: https://en.wikipedia.org/wiki/Half-precision_floating-point_format http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=4610935

bashimao avatar Aug 12 '15 21:08 bashimao

So you are imagining interpreting a Short (or Char) value as a 16-bit floating point number?

non avatar Aug 13 '15 00:08 non

It would probably be relatively straightforward to support encoding/decoding Float or Double values as 16-bit floating point. Emulating the arithmetic operations would be painfully slow -- it might be faster to just support decoding/encoding.

non avatar Aug 13 '15 03:08 non

You could have a value class that wraps a short and supports implicit conversions to/from float/double. That would be relatively convenient to use. But the problem with that would be that arrays of value classes are boxing. And if you really care that much about compact in memory representation you will want to use arrays. So I think the best thing we could do would be just to have conversion methods from/to double that handle all the bit-twiddling

rklaehn avatar Aug 13 '15 07:08 rklaehn

We could wrap an Array[Short] in a value class ShortArray[T](val data: Array[Short]) extends AnyVal, and then have the apply method on ShortArray take a type class CodeShort[T] to code/decode shorts.

Haven't tried it however!

denisrosset avatar Aug 13 '15 08:08 denisrosset

@rklaehn: I kind of agree. Otherwise you would probably have to mimic floating point exceptions. By just encoding/decoding on the fly, you avoid most of that hazzle. We just have to make sure that we map NaN values properly to their 16 bit representations. If my understanding is right, IEEE binary16 cannot distinguish between INF and NaN. However, the Wikipedia article contains a reference to the ARM-encoding which apparently can encode NaN and INF differently. For the sake of compatibility it would probably be advisable to go with that encoding. https://en.wikipedia.org/wiki/Half-precision_floating-point_format#ARM_alternative_half-precision

bashimao avatar Aug 13 '15 08:08 bashimao

For the encoding/decoding part. I found this guy who almost managed to do the wrapping properly in C#: http://codereview.stackexchange.com/questions/45007/half-precision-reader-writer-for-c Same for Java: http://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687#6162687 Some more info about the conversion process: ftp://ftp.fox-toolkit.org/pub/fasthalffloatconversion.pdf

bashimao avatar Aug 13 '15 08:08 bashimao

This SO answer seems to contain all we need. The question is whether we can use this. It says that it is public domain. But I asked the author anyway on SO.

http://stackoverflow.com/questions/6162651/half-precision-floating-point-in-java/6162687#6162687

rklaehn avatar Aug 13 '15 09:08 rklaehn

@denisrosset that would probably work. However, I am not sure if stuff like this would be in scope for spire. The conversion function (bit twiddling) definitely belongs to spire IMHO.

rklaehn avatar Aug 13 '15 09:08 rklaehn

@rklaehn agree with the scope.

However, it would be good to formalize the encoding/decoding process using a specialized typeclass; FastComplex also falls into this category.

I will see after my PhD defense (today!) what I can propose.

denisrosset avatar Aug 13 '15 09:08 denisrosset

@denisrosset Good Luck!

Agree about a typeclass for compact encoding. I do this all the time (see e.g. IntervalTrie.Element in https://github.com/rklaehn/intervalset/blob/master/src/main/scala/com/rklaehn/interval/IntervalTrie.scala ) and would love to have a more generic infrastructure.

rklaehn avatar Aug 13 '15 10:08 rklaehn

I should have commented here -- I wrote an implementation on the train last night that can convert between a Short (using the Wikipedia 16-bit encoding) and Double "correctly" (it seems correct but I need to do some more research about how the rounding is supposed to work and write more tests).

non avatar Aug 13 '15 11:08 non

@denisrosset Also, good luck on your defense!

non avatar Aug 13 '15 11:08 non

@non I got permission from SO user x4u to use his code, in case we still need it.

rklaehn avatar Aug 13 '15 16:08 rklaehn

Thanks, you guys are great!

bashimao avatar Aug 14 '15 04:08 bashimao

Here's the code I wrote: https://gist.github.com/non/29f8d66036afca402f96

I haven't read the StackOverflow code yet -- I'll check it out now. Maybe the author has a better strategy.

EDIT: The author's code seems better because it doesn't use Float/Double operations but just reads/creates the bytes itself. I need to read more about the author's extensions though.

non avatar Aug 14 '15 06:08 non

@non: Thanks! As one might expect, I cannot wait to use this. I just pasted your code into a Scala worksheet to play around with it. But unfortunately it fails to compile the Float16 class. Reason:

Error:(..., ...) value class may not be a member of another class
class Float16(val raw: Short) extends AnyVal { lhs =>

bashimao avatar Aug 17 '15 04:08 bashimao

@bashimao this is probably an artefact of how worksheets are compiled. I guess they are compiled as a class, and inner value classes are not allowed. Just remove the extends AnyVal for playing around with it. That will make it slower, but still good enough for playing around with it.

rklaehn avatar Aug 17 '15 07:08 rklaehn

1/32 of the possible value space is lost with NaNs. Why are you insisting on using IEE standard? It is not like you are ever going to binary cast it, this is not C. With those bits, you could boost already poor precision.

strelec avatar Aug 20 '15 20:08 strelec

I tend to disagree. From a greedy perspective it surely looks like a waste. But in practice, encoding +INF, -INF and NaN separately is quite useful. Otherwise, it is almost impossible to figure out issues with long computations. Getting +INF/-INF gives you a hint that you have scratched the boundaries of the data-type. Without that you would have silent errors, that make the numeric outcomes of your computation unreliable. Expanding the possible value-range is always nice. But not at the expense of retaining information about erroneous states. Just think about the following case:

val a: Float = +INF
val b: Half = a.toHalf

Now b, instead of being +INF is set to Half.MaxValue.

val c: Float = b.toFloat

Should you now convert c to Float.PositiveInfinity or whatever value Half.MaxValue is in Float? Which is the natural choice? What would you expect? Consider the following:

val d: Float = c / 2.0f
val e: Half = d.toHalf

What should half be? I would expect INF/2 = INF again, since it is impossible to determine. However, without the ability to encode INF, you will have a silent error that becomes more problematic with each computation step.

bashimao avatar Aug 24 '15 07:08 bashimao

@strelec If you just want to get a maximum of precision from a 16bit value without binary compatibility to IEE standards, maybe something like spire.math.FixedPoint for 16 bit might be a good approach.

rklaehn avatar Aug 24 '15 08:08 rklaehn

@bashimao: I was not talking about INF and -INF. Those would stay. Same way, NaN would stay, but only one instance of it.

strelec avatar Aug 24 '15 10:08 strelec