Fixed-point number support?

Open DaZombieKiller opened this issue 5 years ago • 18 comments

I discovered Zig yesterday and have spent pretty much all my time since then delving into it and reading various materials on it. Loving it so far.

One feature that I feel is missing currently is built-in support for fixed-point numbers. As the language doesn't allow operator overloading, this isn't something I could implement in user code and use like any other numeric type. For reference, the Embedded C standard specifies support for such numbers (refer to Annex A), although I'm not aware of any C compiler that supports them for a desktop target.

I found it interesting that Zig offers support for variable-width integers through i# and u#. I wonder if this could be applied to fixed-point numbers by supporting some form of Q-notation, perhaps?

Feb 17 '19 18:02 DaZombieKiller

LLVM has

@llvm.smul.fix.*
@llvm.umul.fix.*

https://llvm.org/docs/LangRef.html#fixed-point-arithmetic-intrinsics

But that's it. Interesting. I'll have to learn why these intrinsics are provided. It may be that some architectures provide some fixed point multiplication operations.

The null hypothesis here is that you would have to implement fixed-point in userland, and yes operations would be function calls rather than overloaded operators.

Feb 18 '19 16:02 andrewrk

Several architectures provide fixed-point multiplication operations. Atmel's AVR UC3 provides fixed-point DSP arithmetic, but has no support for floating point representations. This means that e.g. FFT's are more performant than in comparable microcontrollers with a FPU. This is not unusual for a microcontroller specialized for DSP applications.

Feb 19 '19 00:02 ryanavella

Most processors that I use at work don't even have a hardware multiply. Generally we work in ADC counts, but sometimes we will work in fixed-point encoded in 8 or 16 bit integers.

It seems to me that, if LLVM IR has minimal support for fixed-point numbers (only data type, no operations), that it wouldn't be terribly difficult to map fixed-point operations to sequences of bit-shifts and integer operations.

Feb 27 '19 03:02 Diltsman

Fixedpoint Numbers

A smooth upgrade for integers which approximates real numbers and maximizes zen.

Why do we want this?

They are useful for a) financial systems b) software that benefits from both cross platform determinism and fractional precision, such as video games using lockstep networking c) software running on hardware with no floating point support d) software designed to maximize performance of small real numbers using simd operations, such as machine learning.

What is this and how do I use it?

Fixedpoint numbers are an alternative approximation of real numbers, compared to the floats you're probably familiar with. They're integer hardware under the hood, but with a specific number of bits allocated to represent the integer part, and a specific number to represent the fractional part.

In concept, they look like this

// 2.8125, with 4 bits allocated for the integer part, and 4 bits allocated for the fractional part
0010.1101
  ^  ^^ ^
   \  \\ \
    \  \\ \
     \  \\ '0.0625' (1/16)
      \  \'0.25' (1/4)
       \  \
        \  '0.5' (1/2)
         \
          '2'

In code, they look like this:

// -1.5, in signed fixedpoint, with 4 bits allocated for the integer, 
// and 4 for the fractional
const negative_one_point_five: i4_4 = -1.5;

// 0.25, in unsigned fixedpoint, with zero bits allocated for the integer, 
// and 2 for the fractional
const one_quarter: u0_2 = 0.25

// Vector of 8 signed fixedpoints, with 20 bits allocated for the integer part, 
// and 12 for the fractional part, with each lane initialized to the closest 
// possible value to 1337.1337
const lots_of_numbers: @Vector(8, u20_12) = @splat(@truncateLow(u20_12, 1337.1337));

u#_# and i#_# represent unsigned and signed fixedpoint numbers, respectively, with the first number representing the integer bit-width, and the second number (after the _) representing the fractional bit-width. These bit-widths define its radix point, and are what we call the fixedpoint's format.

Zig supports arbitrary bit-width fixedpoint. The maximum allowed total bit-width of a fixedpoint type (i.e., the sum of the integer and fractional bit-widths) is 65535.

If an assigned comptime value does not fit perfectly into the format, it is a compile error.

const impossible: i0_2 = 0.125; // Compile error: literal cannot fit into fixedpoint format

If the format has a fractional bit-width of 0, it is called an integer. In Zig, fixedpoint numbers and integers are actually one and the same!

const one_point_two_five: i8_8 = 1.25;
const golly_gee: i8 = @fixCast(one_point_two_five); // 1.0

Some operations require 'integers' as input. If you try to use a non-integer fixedpoint in them, you'll get a compile error.

const a: i2_4 = 3.25;
const shift_amount: i5_3 = 2.5;
const oops = a << shift_amount; // Compile error: shift_amount must be an integer

Fixedpoint numbers have the following operations defined:

Addition (Wrapping) +%
Addition (Non-Wrapping) +
Addition (Overflowing) @addWithOverflow
Addition (Saturating) #1284
Subtraction (Wrapping) -%
Subtraction (Non-Wrapping) -
Subtraction (Overflowing) @subWithOverflow
Subtraction (Saturating) #1284
Multiplication (Wrapping) *%
Multiplication (Non-Wrapping) *
Multiplication (Overflowing) @mulWithOverflow
Multiplication (Saturating) #1284
Division (Unsigned, Const Signed) /
Division (Signed) @divExact, @divRoundTowardZero, @divRoundTowardNeg
Remainder @rem
Modulo %, @mod
Left Shift (Truncating) <<, <<=
Left Shift (Non-Truncating) @shlExact
Left Shift (Overflowing) @shlWithOverflow
Sticky Left Shift (Truncating) <<<, <<<= #5220
Sticky Left Shift (Non-Truncating) @shlStickyExact #5220
Sticky Left Shift (Overflowing) @shlStickyWithOverflow #5220
Right Shift (Truncating) >>, >>=
Right Shift (Non-Truncating) @shrExact
Right Shift (Overflowing) @shrWithOverflow
Sticky Right Shift (Truncating) >>>, >>>= #5220
Sticky Right Shift (Non-Truncating) @shrStickyExact #5220
Sticky Right Shift (Overflowing) @shrStickyWithOverflow #5220
Negation (Wrapping) -
Negation (Non-Wrapping) @negExact
Negation (Saturating) #1284
Negation (Overflowing) @negWithOverflow
Bitwise And &, &=
Bitwise Or |, |=
Bitwise Xor ^, ^=
Implicit widening coercion (as long the individual integer bit-width and fractional bit-width will fit into the destination type, it will be implicitly shifted to fit)
Explicit Truncating Floatingpoint to Fixedpoint conversion @floatToFix(comptime T: type, float: var) T (Converts a floatingpoint to a fixedpoint, truncating as necessary.)
Explicit Exact Floatingpoint to Fixedpoint converstion @floatToFixExact(comptime T: type, float: var) T (Converts a floatingpoint to a fixedpoint. If the floatingpoint number cannot fit in the destination type, it invokes safety-checked Undefined Behavior.)
Explicit Truncating Fixedpoint to Floatingpoint conversion @fixToFloat(comptime T: type, float: var) T (Converts a fixedpoint to a floatingpoint, truncating as necessary.)
Explicit Exact Fixedpoint to Floatingpoint conversion @fixToFloatExact(comptime T: type, float: var) T (Converts a fixedpoint to a floatingpoint. If the fixedpoint number cannot fit in the destination type, it invokes safety-checked Undefined Behavior.)
Explicit Truncating Fixedpoint to Fixedpoint Cast @fixCast(comptime T: type, value: var) T (Converts a fixedpoint type to another fixedpoint, truncating the value as necessary.)
Explicit Exact Fixedpoint to Fixedpoint Cast@fixCastExact(comptime T: type, value: var) T (Converts a fixedpoint to another fixedpoint while keeping the same numerical value. Attempting to convert a number which is out of range of the destination type results in safety-protected Undefined Behavior.)
Truncate High Bits @truncateHigh(comptime T: type, value: var) T (Truncates high bits from a fixedpoint type, resulting in a smaller or same-sized fixedpoint type composed of the low bits, which may no longer store a sensible fixedpoint value. Useful for unsafe bit-surgery that @fixCast doesn't allow. e.g. const a: i4_4 = ...; const b = @truncateHigh(i2, a);
Truncate Low Bits @truncateLow(comptime T: type, value: var) T (Truncates low bits from a fixedpoint type, resulting in a smaller or same-sized fixedpoint type composed of the high bits, which may no longer store a sensible fixedpoint value. Useful for unsafe bit-surgery that @fixCast doesn't allow. e.g. const a: u2_6 = ...; const b = @truncateLow(u3, a);
Floor @roundTowardNeg
Ceil @roundTowardPositive
Trunc @roundTowardZero
Round @roundTowardInfinity
Bit Cast @bitCast
Bit Reverse @bitReverse
Byte Swap @byteSwap
Pop Count @popCount
Count Leading Zeros @clz
Count Trailing Zeros @ctz
Square Root @sqrt (See 'Alternative Implementations' below)
Sin @sin (See 'Alternative Implementations' below)
Cos @cos (See 'Alternative Implementations' below)
The rest of the trig functions (...which are in in std.math? How do @sin and @cos relate? I feel I'm missing something here.)
Pretty much everything in std.math, really.
Pow (Wrapping) @pow
Pow (Non-Wrapping) @powExact
Pow (Overflowing) @powWithOverflow
Pow (Saturating) #1284
Exp (Wrapping) @exp
Exp (Non-Wrapping) @expExact
Exp (Overflowing) @expWithOverflow
Exp (Saturating) #1284
Exp2 (Wrapping) @exp2
Exp2 (Non-Wrapping) exp2Exact
Exp2 (Overflowing) @exp2WithOverflow
Exp2 (Saturating) #1284
Log (Wrapping) @log
Log (Non-Wrapping) @logExact
Log (Overflowing) @logWithOverflow
Log (Saturating) #1284
Log2 (Wrapping) @log2
Log2 (Non-Wrapping) @log2Exact
Log2 (Overflowing) @log2WithOverflow
Log2 (Saturating) #1284
Log10 (Wrapping) @log10
Log10 (Non-Wrapping) @log10Exact
Log10 (Overflowing) @log10WithOverflow
Log10 (Saturating) #1284
Abs (Wrapping) @abs
Abs (Non-Wrapping) @absExact
Abs (Saturating) #1284
Abs (Overflowing) @absWithOverflow
Minimum Value @minNum
Maximum Value @maxNum
Minimum @minimum
Maximum @maximum
Copy Sign @copySign

These operations are also implemented on @Vectors with fixedpoint element types.

Our Greasy Forearms

Fixedpoint numbers add an additional approximation of real numbers. The Zig compiler currently makes some assumptions that all real number representations must be floats.

The following breaking changes will be prudent:

comptime_int -> comptime_fix(i, f)

Furthermore, a number of operations that assume floats should now be made generic,

@fabs -> @abs
@fshl -> @shl
@fshr -> @shr
@floor
@ceil
@trunc ...and many others

@truncate and @trunc are now less clear than they already were, and should be revisited. @truncateHigh and @truncateLow are suggested as they follow common use cases, while @roundTowardPositive, @roundTowardNegative, @roundTowardZero, and @roundTowardInfinity are suggested to communicate intent precisely. The exact names of these functions are in discussion.

Fixedpoint numbers also extend the current integer codebase, causing integers (e.g. i8) to be a shorthand for fixedpoints with no fractional part (e.g. i8_0). This adds valuable functionality without increasing the language's complexity or api surface area.

LLVM's builtins for fixedpoint math should be used whenever possible, otherwise we implement it within Zig atop the existing LLVM integer code. Thankfully, basic operations are quite trivial to implement:

// I know this definition is, with mild imagination, recursive. 
// Please mentally substitude llvm integer intrinsics.
const i8_8 = struct {
    bits: i16,
    
    const integer_bit_width: usize = 8;
    const fractional_bit_width: usize = 8;
    
    // Add is the same as integers
    const add = fn (self: Self, other: Self) Self {
        return self.bits + other.bits;
    };

    // Sub is the same as integers
    const sub = fn (self: Self, other: Self) Self {
        return self.bits - other.bits;
    };

    // Multiplication requires the factors be widened and
    // the product be left-shifted by the fractional bitcount,
    // to offset scaling
    const mul = fn (self: Self, other: Self) Self {
        return Self {
            bits: fixCast(i16, (fixCast(i32, self) * fixCast(i32, other) >> Self.fractional_bit_width)
        };
    }

    // Division requires the dividend be widened and 
    // left-shifted by the fractional bitcount to offset scaling
    const div = fn (self: Self, other: Self) Self {
        return Self {
            bits: fixCast(i16, (fixCast(i32, self) << Self.fractional_bit_width) / fixCast(i32, other))
        };
    }
}

However, stdlib trigonometric functions and square root must, for the purpose of the users sanity, return the mathematically correct answer for the number of bits. It is suggested that whatever methods are used (taylor series, slippery polynomial approximations, newtons method), the expansion and/or iterations be comptime tailored to the fixedpoint format. These implementations are not for those uninclined toward numerical analysis.

Drawbacks

Will break existing code, although that's par for the course nowadays
More stuff tossed into the compiler just so we can have infix ops and native-looking @roundTowardZero, etc.
sin, cos, tan, acos, asin, atan, sqrt and friends nontrivial to implement. Numerical analysis is addicting and may eat your developers alive.

Rationale and alternatives

Alternative Syntax

The proposed i#_#/u#_# syntax matches zig's i# / u# syntax for integers, with a _# added on as a status-quo-parser friendly delineation between the integer and fractional.

const foo: i12_20 = 32.4; // not unexpected, really.

Alternatively, . instead of _ would not only match the standard Q format syntax, but turn the format names into uniquely reserved words not available in userspace, not unlike the current infix operations. This solves the issue of squatting the whole i#_# / u#_# namespace.

const foo: i12.20 = 32.4; // the platonic ideal!

If special-casing the . syntax is too egregious, f could be used in its place, matching the i/u moniker, but creating a sort of alphabet soup.

const foo: i12f20 = 32.4; // is this an integer12/float20 lovechild?

Variations of the Q syntax are also possible, such as qi12_20 / qu12_20, qi12.20 / qu12.20, iq12_20 / uq12_20, iq12.20 / uq12.20, though they add typing for no real (geddit?) advantage.

Finally, we could simply keep integer syntax at status quo and require any fractionals to use std.meta.FixedPoint(i, f), though it throws comeliness out with the bikeshed.

const foo: std.meta.FixedPoint(12, 20) = 32.4; // where's the romance?

Alternative Implementations

The trignometric functions and sqrt are likely to fall into an awkward position where users expect them to be 100% accurate to the bit-width, but in practice 100% accuracy is often second-place to performance for most numerics applications. It may be practical to provide functions for users to build custom approximations suited to their use-case.

// Returns the value of sin(x), exact to the bit-width.
// If hardware support is unavailable, uses @sinPoly with an 
// internally specified order, then iterates from an initial guess using 
// newton's method.
const @sinExact = fn(value: var) var { ... };

// Returns the approximate sin of value, via a polynomial of the 
// specified order. 
// See Prior Art #5 and #6 for example implementations
const @sinPoly = fn(comptime order: comptime_int, value: var) var { ... };

This pattern could be expanded to other expensive functions, like:

// Returns the quotient of dividend and divisor. 
// Caller guarantees denominator != 0 and 
// @divTrunc(numerator, denominator) * denominator == numerator.
// If hardware support is unavailable, computes via @divPoly with an 
// internally specified order, then iterates from an initial guess using 
// newton's method.
// See Prior Art #7 for a hardware-accelerated example implementation
const @divExact = fn(dividend: var, divisor: var) var { ... };

// Returns the approximate quotient of dividend and divisor, via 
// polynomials of the specified order.
// See Prior Art #7 for a hardware-accelerated example implementation
const @divPoly = fn(comptime order: comptime_int, dividend: var, divisor: var) var { ... };

While mirroring the reality of nontrivial fixedpoint numerics, and immensely practical for many situations, this increases api surface area and may feel a bit too 'batteries included' for Zig.

Alternative Formats

Fixedpoint numbers are the most performance-efficient alternative approximation of real numbers beside hardware floats. They are not the only alternative, however.

Unum3's Posits and Valids are a convincing alternative to floats when it comes to space efficiency and accuracy, but are slower and more complicated to implement than fixedpoint.
Interval Arithmetic, implemented via integer arrays, could be used to represent values between a range of [a..b]. A very convenient option for mathematics involving unknowns, but sidesteps the problem of reals altogether by worsening the accuracy, and makes most mathematics impractical even when aggressively averaging back to a single value.
Floats claim determinism when the floating point unit is set to 'strictly' follow the IEE754 standard, but not only does this lower average performance, it is also questionably deterministic in practice.
16 bit floats are implemented in some hardware, but only compete on size, and have all the problems of larger floats.

If we don't implement fixedpoint into the language, userspace libraries will rise to the challenge, but be hindered by Zig's lack of ǒ̷̹̙̠̲p̶̞͗̐̐͑ḙ̴̡͘r̴̞̂͆ạ̵̠͚̬̅́̾͒ť̶̛̗̈́o̸̡͕̼͊͘r̶̟̝̗̐̽̽̚ ̷̞̤̻͂̀͆͠ó̴͓̼̺̺̒͐̉v̷̛̥e̶̢͔̲̪̔̓ŕ̴̞̆̚͜ļ̸̝̏̿ỏ̵̼̦͋̋́a̷̼̟͍̚̚͘d̴̩̮͎̅̄̋i̵̭͔͉̰̕n̵̝̩͂́g̸̤̈.

As the elementary math operations for fixedpoint numbers use trivially more operations than hardware integer mathematics (an occasional shift), it is unlikely that any alternative formats would warrant language implementation, as they are too complex to be practical without hardware support. However, if hardware support is implemented for other alternative formats, such as Posits and Valids, implementing fixedpoint puts us well over the edge of the slipperly slope to implementing those as well.

A reasonable alternative that lets us avoid the need to futureproof the compiler and our technical debt against any practical future number format, is to implement o̸̢̳̝̘͓̰̲̗̫͈͉͓̱͆̄̑̽̓͜p̷̛͔̹͚̅̈́͂̀̓̌͜ę̸̠͗͆̌̋̋̏̈́̋̏̈́̃́̚͠r̷͕̗͓͎̗͑̾͋͜a̶̹͇̯̒t̷̺̔̆̾͗͑͗̑͌͌͜͠ö̶̧̢̤̯̞̹̲̗̳͉͓̪͈͚̈̍͐̅̿̐̀̇̕̚͘ŕ̷̡̧̯̻̞̜̲̭͉͉̮͉̤̟̾̈́͂̏͋͌̔̆̇̐͆͘͘͝͠ͅ ̴͎̺̙͎̹̯̘͇̘̊̿̍̿́͑̉̾̒̾͐͊̓̐͘o̵̧̟͓̜̩͎̦͙̮̺͔̦̎͑̃̃̈̀̉͐́̈́̽̕v̵̬͍̙̰̘̟͚̦̱̞̠͈͔̈́e̴̡̦͓͙̤̜͇̮͉̪̲͖̘̐̀̅ͅr̶̜̳͖͇̮̟̖̱͇̼̀̒͂̏͑̕l̴̡̪̞̮̖̟̥̯̰̰̙͙̯̻̺̐̄́̓̀̎̇͆̚͝͝ͅo̴̩̜̟͍̽à̷̙̱̻͕͍̼̥̳͇͈̗͜d̸̹̜̯̭̙̠͋̈́́̊ï̷̢̲͈͈̱̠͔͔̪͋̽̏͐̈͂̎̾̈́̄͘͘͠n̷̢͉̣̺̝̺̥̲̩̈́͐̌́͝ͅǵ̴͚̹.

Alternative Social Constructs

Aside from syntax and overall grooviness, nothing we're doing here is really specific to Zig. Our software implementation could be upstreamed into LLVM for other languages to benefit from.

Prior art

Fixed-point arithmetic, Wikipedia article
Fixed Point Arithmetic and Tricks
The Tonc Tutorial on Fixed-Point Numbers and LUTs
Doing it Fast: Fixed Point Arithmetic
Another fast fixed-point sine approximation
Nvidia's sin approximation used in CG
Fast fixed-point divider based on Newton-Raphson method and piecewise polynomial approximation
ZigGBA's Fixedpoint Type
libfixmath, an implementation the author does not particularly care for.
capsize, a rather brutal macro-based implementation in Rust. Includes many approximation functions.
fixed, a less featureful, generics-based Rust implementation.

...and many more, including implementations.

Unresolved questions

None. Absolutely None. If you've read this far, how can you doubt?

Future possibilities

Fixedpoint numbers' small size, cross-platform support, enforced determinism, and computational simplicity put them in an ideal niche that floats don't touch.

The main barrier to wider adoption has been lack of ease of use - even if it makes more sense to use fixedpoint, finding a good fixedpoint library, or writing one's own, will make users "just use floats", unless they absolutely have to.

Making these numerics first-class citizens in Zig with the power of comptime can open up a whole class of application development to less-specialized programmers. As Zig takes over the world, this will encourage hardware vendors to implement more fixedpoint functions in silicon, bringing us to a bright and happy future of @Vector(64, i12_4), free of the scourge of -0 and NaN.

Apr 28 '20 02:04 floopfloopfloopfloopfloop

Also, basic integer types could be thought of as special cases of fixed point numbers. i32 is equivalent to i32_0 in the proposed syntax.

Apr 28 '20 02:04 Diltsman

Also, basic integer types could be thought of as special cases of fixed point numbers. i32 is equivalent to i32_0 in the proposed syntax.

Interesting point. Operations like @intToFloat could then be dropped in support of @fixToFloat, keeping the fn sig footprint down to two types, except in cases where an integer (i.e. no fractional bits) is explicitly required.

Edit: Updated the RFC accordingly.

Apr 28 '20 04:04 floopfloopfloopfloopfloop

@floopfloopfloopfloopfloop any suggestions on how this might interact with #3806?

Nov 25 '20 03:11 daurnimator

I agree wholeheartedly with @floopfloopfloopfloopfloop's proposal, although I'd suggest one change to the syntax: uMpN/iMpN, for an unsigned/signed number ~~with M integer bits and N fractional bits~~ (see next comment): u0p23, i2p14. It looks cleaner, and has precedent in hex floating point. ~~(The question is: do we allow i0p{}? That doesn't really make sense, does it? If we allow that, what about smaller intervals?~~ Come to think of it, does i0 make sense?)

Nov 25 '20 03:11 EleanorNB

Actually, one more: I'd suggest having uMpN/iMpN refer to a fixpoint with M total bits, N of which are fractional. The rationale is that we can then see the size of the type at a glance, and edit a type to have fractional bits in-place without doing mental arithmetic (i.e. all xMp_ will be memory compatible). Also, we can then extend it to platform-sized ints (see https://github.com/ziglang/zig/issues/5185#issuecomment-733558882): usizep3, idatap10, etc.

Nov 25 '20 08:11 EleanorNB

@daurnimator I think that proposal will work as expected, but extended to four parameters rather than two? A builder struct seems like it'd become useful at that point. I don't have time to read the proposal deeply atm.

@EleanorNB Having used libraries with both TotalBits.FractionalBits and IntegerBits.FractionalBits naming formats, I've found the latter is more practical. The most important thing for algorithms is that one have the proper number of integer bits to avoid overflow, and the remaining issues can be solved by increasing the fractional bit count as necessary. The mental arithmetic to find the total size is trivial, and the fact one tends to experiment with different sizes during development leads one to use alias types and @sizeOf(T) anyway, increasing robustness of the code and iteration speed.

As for platform-sized ints, I disagree with names like usizep3 and idatap10 because

the types proposed represent semantic information that is never fractional
it all blends together into an alphabet soup

As for the concept of platform-sized fractional bitcounts, it can already either be solved with alias/@sizeOf approach, or we can explicitly define a const i16_whatever = @Fixedpoint(true, 16, @sizeOf(isize)-16);. The technique to use is left to the user's preference.

Dec 16 '20 10:12 floopfloopfloopfloopfloop

Just going to state something obvious here, because it wasn't explicitly mentioned above. It would mean it would be possible to write a function that could accept both floats and fixedpoints.

I'm currently writing a library that generates random noise. It generates deterministic u32 noise, but then has functionality for normalizing that to a number 0..1 and doing operations on that. Currently, specifically generating an FBM based heightmap for a terrain. Some games will need this to be deterministic, and as we all know with floats, "it's complicated".

I already support different float types with anytype like so, so it would be nice if this was approved and then fixed points just worked.

pub fn noise_fbm_2d(pos_x: anytype, pos_y: anytype, seed: u32, octave_count_c: u32, gain: @TypeOf(pos_x), feature_size: @TypeOf(pos_x)) @TypeOf(pos_x) {

May 11 '21 04:05 Srekel

@Srekel "I already support different float types" What does this mean? There are many representations.

Do you support both binary or decimal floating-point types or even more? Here is another fixpoint library by @geemili (discord game-dev).

You might want to read this comment as well: "To be clear, I'm highly skeptical of introducing native decimal fraction support in Zig. I'm just saying that if we do, then #1974 is not sufficient, unless it is extended to specify the base as well."

some nice link on binary fixed-point.

May 11 '21 07:05 matu3ba

Given that I'm still learning Zig, there may be a better way to do what I'm doing. But as far as I understand, since i use anytype, it'll accept anything that doesn't cause a compile error, and that is anything that I can 1) do general math operations on (+-*/) and that I can cast to an int, currently with @floatToInt.

May 11 '21 09:05 Srekel

Another use-case would be fonts and image formats which encode and compute using fixed-point representations.

May 29 '22 15:05 tauoverpi

just ran into this recently implementing my own opentype/truetype parser and planned on implementing in userspace unless this was accepted/resolved

May 30 '22 04:05 nektro

FYI: n-bits isn't the only common format. Financial systems use the number of base10 digits. For example Nasdaq (and many others) use an implied "6.4" format in 32-bits for prices. From Nasdaq's spec:

Prices are numeric fields with an implied 4 decimal places. Prices are to be treated as unsigned numeric fields, unless designated otherwise.

Trivia: This is why Nasdaq doesn't trade Berkshire Hathaway. The price exceeds 6 digits.

Sep 10 '22 17:09 TwoClocks

See ./lib/compiler_rt/README.md:

Goals:

    zig as linker for object files produced by other compilers => function compatibility to compiler-rt and libgcc for same-named functions
        compatibility conflict between compiler-rt and libgcc: prefer compiler-rt
    symbol-level compatibility low-priority compared to emitted calls by llvm
        symbol-level compatibility: libgcc even lower priority
    add zig-specific language runtime features, see #7265
        example: arbitrary bit width integer arithmetic
        lower to call those functions for e.g. multiplying two i12345 numbers together
        proper naming + documention for standardizing (allow languages to follow our exmaple)

libgcc includes bespoke runtime routines, which are the base for fixed-point number support. However, as of now, there are no routines yet.

Sep 10 '22 21:09 matu3ba

@TwoClocks What is the minimal subset in compiler_rt of this runtime routines to provide something usable?

I would assume the conversion can be generalized at cost of efficiency (DSPs use custom hardware).

Sep 10 '22 21:09 matu3ba

In my use case I write a 2D platformer game on wasm4 with a screen resolution of 160x160, so even a single pixel error is noticeable on the screen. Initially I used all i32 for vectors, coordinates and calculations. Then I moved to f32 because of rounding errors. Now I'm moving towards fixedpoint and I don't particularly like it. The problem with f32 is as follows:

Imagine a situation where actor's coordinate is x=5, then it moves 5 points to the left x = 5 - 5 => 0.00000000000000001. The terrain under the actor has x=0, so visually and logically the actor should fall but technically 0.0000000000000001 > 0 so this doesn't happen.

In the process of moving towards fixedpoint I have this example, a camera update function. First the status quo with f32:

    pub fn update(self: *Camera, hero: Rectangle, scrolloff: Scrolloff, map_size: Rectangle) void {
        const desired_min_x = hero.x - scrolloff.x;
        const desired_max_x = hero.x + hero.w + scrolloff.x;
        const allowed_min_x = map_size.x + self.offset.x;
        const allowed_max_x = map_size.x + map_size.w - self.offset.x;
        if (desired_min_x < self.target.x - self.offset.x) {
            self.target.x = std.math.clamp(desired_min_x + self.offset.x, allowed_min_x, allowed_max_x);
        } else if (desired_max_x > self.target.x + self.offset.x) {
            self.target.x = std.math.clamp(desired_max_x - self.offset.x, allowed_min_x, allowed_max_x);
        }
        const desired_min_y = hero.y - scrolloff.y;
        const desired_max_y = hero.y + hero.h + scrolloff.y;
        const allowed_min_y = map_size.y + self.offset.y;
        const allowed_max_y = map_size.y + map_size.h - self.offset.y;
        if (desired_min_y < self.target.y - self.offset.y) {
            self.target.y = std.math.clamp(desired_min_y + self.offset.y, allowed_min_y, allowed_max_y);
        } else if (desired_max_y > self.target.y + self.offset.y) {
            self.target.y = std.math.clamp(desired_max_y - self.offset.y, allowed_min_y, allowed_max_y);
        }
    }

Then same with a user-space fixedpoint, thanks to @MasterQ32 for the implementation idea:

fn FixedPoint(comptime T: type, comptime scaling: comptime_int) type {
    return struct {
        const FP = @This();
        raw: T,

        pub fn init(v: i32) FP {
            return .{ .raw = scaling * v };
        }
        pub fn initFromFloat(v: f32) FP {
            return .{ .raw = @floatToInt(T, scaling * v) };
        }
        pub fn unscale(fp: FP) i32 {
            return fp.raw / scaling;
        }

        pub fn add(a: FP, b: FP) FP {
            return .{ .raw = a.raw + b.raw };
        }
        pub fn sub(a: FP, b: FP) FP {
            return .{ .raw = a.raw - b.raw };
        }
        pub fn mul(a: FP, b: FP) FP {
            return .{ .raw = (a.raw * b.raw) / scaling };
        }
        pub fn div(a: FP, b: FP) FP {
            return .{ .raw = (scaling * a.raw) / b.raw };
        }
        pub fn eq(a: FP, b: FP) bool {
            return a.raw == b.raw;
        }
        pub fn lt(a: FP, b: FP) bool {
            return a.raw < b.raw;
        }
        pub fn gt(a: FP, b: FP) bool {
            return a.raw > b.raw;
        }
        pub fn clamp(val: FP, lower: FP, upper: FP) FP {
            assert(lower.raw <= upper.raw);
            return max(lower, min(val, upper));
        }
        pub fn min(a: FP, b: FP) FP {
            return if (a.raw < b.raw) a else b;
        }
        pub fn max(a: FP, b: FP) FP {
            return if (a.raw > b.raw) a else b;
        }
    };
}
pub const WorldCoordinate = FixedPoint(i32, 1000);
const WC = WorldCoordinate;
pub const WorldPosition = struct {
    x: WorldCoordinate,
    y: WorldCoordinate,
};

    pub fn update(self: *Camera, hero: Rectangle, scrolloff: WorldPosition, map_size: Rectangle) void {
        const desired_min_x = hero.x.sub(scrolloff.x);
        const desired_max_x = hero.x.add(hero.w).add(scrolloff.x);
        const allowed_min_x = map_size.x.add(self.offset.x);
        const allowed_max_x = map_size.x.add(map_size.w).sub(self.offset.x);
        if (desired_min_x.lt(self.target.x.sub(self.offset.x))) {
            self.target.x = desired_min_x.add(self.offset.x).clamp(allowed_min_x, allowed_max_x);
        } else if (desired_max_x.gt(self.target.x.add(self.offset.x))) {
            self.target.x = desired_max_x.sub(self.offset.x).clamp(allowed_min_x, allowed_max_x);
        }
        const desired_min_y = hero.y.sub(scrolloff.y);
        const desired_max_y = hero.y.add(hero.h).add(scrolloff.y);
        const allowed_min_y = map_size.y.add(self.offset.y);
        const allowed_max_y = map_size.y.add(map_size.h).sub(self.offset.y);
        if (desired_min_y.lt(self.target.y.sub(self.offset.y))) {
            self.target.y = desired_min_y.add(self.offset.y).clamp(allowed_min_y, allowed_max_y);
        } else if (desired_max_y.gt(self.target.y.add(self.offset.y))) {
            self.target.y = desired_max_y.sub(self.offset.y).clamp(allowed_min_y, allowed_max_y);
        }
    }

Some subjective opinions:

clamp looks really fine - a.clamp(allowed_min_y, allowed_max_y).
All the add, sub etc. are chainable, look OK - map_size.x.add(map_size.w).sub(self.offset.x). But it is hard to get the precedence, now parens are overloaded as function calls fun(), instead of specifically being for precedence (a + b).
Comparisons are border-line readable - desired_min_y.lt(self.target.y.sub(self.offset.y)), probably because of the precedence.
Plain values are a bit more involved than they should - WC.init(0).

When I imagine doing same for collision detection, I feel pain. Have you ever mixed x and y? Now it is convoluted even more with all these function calls. Vector math is already hard to write, it shouldn't be even harder.

Other solutions to the problem instead of a fixedpoint:

Resort to using f32 and doing roundToInt(a) > roundToInt(b) for all comparisons, looks less of an evil.
Use fixedpoint implicitly as i32 without a type system, just be careful and remember to scale/descale coordinates at proper places.

Sep 23 '22 12:09 greenfork

They are useful for a) financial systems

Not useful for financial. Fixed-point still uses a binary fraction, as diagrammed further down in the post. Financial systems expect a decimal fraction. There is no hardware support for decimal floating-point in commonly available CPUs used in most computers these days. Binary floating-point won due to being easier to implement in hardware, so once again speed won over usefulness.

Fixed-point is not any more or less accurate than floating-point, they are just more limited in range for a given number of bits used to represent the values.

Providing decimal floating-point support in Zig would be more useful than binary fixed-point. Providing programmers with a decimal floating-point type might get them to actually consider (or become aware of) the difference between decimal and binary floating point, and we could all sleep a little better at night.

Edit: It is important to be very clear when saying "fixed point", since there are multiple ways for implementation:

https://en.wikipedia.org/wiki/Fixed-point_arithmetic

Using fixed-point in the sense that, for example, 1.23 would be stored with an integer (i32) as 1230, with an implicit scaling factor of 1000, works for things like financial and other such calculations.

However, if the fixed-point representation is being used like a binary floating point format, i.e. base-2 exponents in the fraction, then the benefit of using fixed-point for fractional-decimal representation goes away.

It is unclear to me which method of "fixed point" is being advocated for in the previous posts, so my comments may be just noise. However, the long post on April 27, 2020 shows a base-2 exponent, so that would not be suited for decimal fractions.

Sep 28 '22 22:09 dnotq

Providing decimal floating-point support in Zig would be more useful than binary fixed-point.

To be fair, decimal floating points have not yet been optimized on performance and memory representation. So this would be research area: "Finally, one should mention that considerable additional work is required in order to enable mixed-radix comparisons in the case when the decimal floating-point number is stored in dense-packed-decimal representation."

https://hal.archives-ouvertes.fr/hal-01021928v2/document "Comparison between binary and decimal floating-point numbers Nicolas Brisebarre, Christoph Lauter, Marc Mezzarobba, and Jean-Michel Muller"

You really dont want to unpack the numbers for every comparison, if possible.

Sep 28 '22 23:09 matu3ba

Fixed-point is not any more or less accurate than floating-point, they are just more limited in range for a given number of bits used to represent the values.

This depends on the use-case. Ignoring overflow and other special cases, fixed-point addition is always exact, while floating-point addition has an accuracy of +/- 0.5 ulp. This makes floating-point poorly suited for tasks like cumulative sums.

On the other hand, fixed-point can be poorly suited for division. Something like (N/M)*M can be off by 1 ulp for floating-point, but as much as N-1 ulp for fixed-point.

Providing decimal floating-point support in Zig would be more useful than binary fixed-point. Providing programmers with a decimal floating-point type might get them to actually consider (or become aware of) the difference between decimal and binary floating point, and we could all sleep a little better at night.

I assume you meant "decimal fixed-point" and not "decimal floating-point?" GnuCash is an example of financial software that uses the former, and @TwoClocks mentioned that Nasdaq uses a form of fixed-point as well.

Sep 28 '22 23:09 ryanavella

This really should not be a language problem, IMO, but if supporting complex data as native types in Zig is being considered, decimal floating point would be a good one. There are a lot of mistakes made in software because programmers do not understand what a float or double really are, and reach for those when doing financial kinds of computing, or calculating metric values with fractional components, etc.

There are decimal floating point representations that have been optimized for performance as well as anyone is willing to make the effort. For example, one such library:

https://www.speleotrove.com/decimal/

There are also mainframe systems that have native hardware (and software) support for decimal floating point, mainly because (thankfully) those kinds of computers are used for large financial applications.

Performance always seems to take precedence over everything, to the point of doing things incorrectly. Commodity computer hardware does a crappy job at providing any kind of support for decimal floating point, so for now it will have to be 100% a software solution. Yes, it is slower than binary floating point, but that is the mess we have created for ourselves (as people making computers), and using a software lib with slower decimal floating point computations is simply the cost of doing financial and other calculations correctly.

I assume you meant "decimal fixed-point" and not "decimal floating-point?"

Nope, I was saying I would rather have/see "decimal floating point" support in Zig, over "binary fixed point" support.

GnuCash is an example of financial software that uses the former,

I have not looked at how GnuCash uses fixed point, so I cannot comment other than to say, just because something is implemented does not mean it is correct. I'm not saying GnuCash has done anything wrong, there are ways to do financial with integers and such (tracking thousandths of a penny as integers, for example). And when you implement you own fixed-point functions, you get to decide what the fractional exponent represents, so in the case of GnuCash maybe it tracks the fractions in base-10 rather than base-2. However, if I were considering using any lib for any serious financial application, I would audit the code to make sure I understood what it was doing before I used it.

and @TwoClocks mentioned that Nasdaq uses a form of fixed-point as well.

I would not make any assumptions without seeing the whole Nasdaq specification. An unsigned 32-bit value cannot represent an entire 6.4 (10-digit) value, so I would be careful with how that quote is being interpreted. It seems more like a spec for data exchange where the data is being imported or exported as text representations of the numeric data, rather than describing how the Nasdaq stores internal values or performs calculations on their financial values.

Sep 29 '22 00:09 dnotq

@dnotq I'm a little confused as to what exactly you are advocating. When you say "floating-point decimal" do you mean a fixed-width type like LLVM's and GCC's _Decimal32, or do you mean an arbitrary-precision type like Python's decimal module?

If you mean the former, there is already an issue for this (see #4221), and rounding errors are still an issue re: cumulative sums. (sidenote: this is part of why GnuCash switched to fixed-point integers in version 1.6)

If you mean the latter, then I think your suggestion would be somewhat at odds with the Zig selling point of "no hidden allocations." Usually these kinds of types would be provided in std rather than as a builtin.

Regardless, fixed-point integers are used for applications other than finance, so just because a decimal type would be better in some use-cases does not negate the need for fixed-point.

Sep 29 '22 01:09 ryanavella

It seems more like a spec for data exchange where the data is being imported or exported as text representations of the numeric data, rather than describing how the Nasdaq stores internal values or performs calculations on their financial values.

It is both the price format for real-time communication to/from the exchange and the internal representation the exchange uses. It isn't just Nasdaq most of the US equities use some base10 implied decimal format that is either 32 to 64 bits long. For 64 bit formats, it's usually an implied "8.11" format. Which has enough resolution after the decimal for accurate bond prices. Most FX as well, but sometimes 8 whole digits isn't enough for some currency pairs.

You are correct that it doesn't cover the whole range, but it doesn't need to. Nasdaq, and most other exchanges do not accept prices above 200K. So implied 6.4 works fine.

The dirty secret to most exchanges: They don't do any math at all on prices. They just need to compare prices. As long as it compares like an int, the exchange doesn't really care. It's just a presentation/downstream issue.

If you need to do * or / you either convert to a 64bit format or use something like java's BigDecimal. Floating point is a non-starter.

Most libraries use n-bits for the implied decimal, like Rust "fixed" crate. I was just pointing out that there is a common base10 use case as well.

I'm not sure what adding it to the language gets you through. Seems fine for a user-land lib, unless I'm missing something.

Sep 29 '22 01:09 TwoClocks

@ryanavella I made an edit above in my original post, since it was unclear to me after re-reading the OP, what was being asked for to being with.

All I'm mostly trying to stress is that the term "fixed point" is overloaded and does not always mean that it is safe to use for accurate decimal-fractions, since it really depends on the implementation.

The post on April 27, 2020 says fixed-point is useful for financial systems, but then goes on to show a base-2 exponent example for the fractional part of the number, which is not useful for financial systems.

In my experience binary floating point is very misunderstood and used incorrectly in places where accurate decimal-fractions are important (and sometimes critical). If a language provides a type that "looks like decimal", yet does not behave like decimal, it can be detrimental.

Even though binary floating point is easy and fast to implement in hardware (which is why it was chosen), it was the wrong choice IMO, and software as a whole suffers for that decision.

@TwoClocks

I'm not sure what adding it to the language gets you through. Seems fine for a user-land lib, unless I'm missing something.

Agreed. Although I'm not clear if Zig is trying to represent more complex non-CPU data types in the language or not? If Zig is still primarily "a better C", then I would say leave something like this to a lib. On the other hand, IIRC, C is getting native DecFP support in the spec.

Sep 29 '22 01:09 dnotq

zig zig copied to clipboard