language icon indicating copy to clipboard operation
language copied to clipboard

Better int type support in Dart, now that targeting wasm is (nearly) possible

Open lukehutch opened this issue 2 years ago • 21 comments

(Sorry to ask this in an issue -- the discussions section is not enabled for this github project...)

Dart's int type has 53 bits of precision, a hangover from Javascript's numerical types. However Javascript is only one of several targets for Dart at this point. And Dart will soon be able to target wasm, which has proper 32-bit and 64-bit integer types (as does every other Dart target platform).

At this point, can proper [u]int32 and [u]int64 types be added to Dart? For that matter, having a proper uint8 type would make byte data handling much easier on Dart (and probably much more efficient in some cases -- I have seen libraries using List<int> (not backed by Uint8List) to represent bytes, which is horribly inefficient...). [u]int16 would also be useful sometimes.

int32 can already be supported out of the box on Javascript, since it fits int Javascript's numerical type, and int64 could be emulated in Javascript by using two int32s.

lukehutch avatar Jul 06 '23 09:07 lukehutch

See Also: https://github.com/dart-lang/sdk/issues/52250

gintominto5329 avatar Jul 06 '23 10:07 gintominto5329

My take:

int32/uint32

  • + Much needed
  • + Because u64 are not as efficient as u32, on current CPUs

uint8, and int16/uint16

  • - All int types with smaller width than 32 bits, are slower than their 32-bit counterpart, which is int32/uint32/c::int
  • - "probably much more efficient in some cases" is a myth, dart's objects are quite heavy, 16 byte (128 bits) per dart::int, uint8 wont change this fact
  • - More clutter in the sdk

gintominto5329 avatar Jul 06 '23 10:07 gintominto5329

This is mainly a library issue, not a language issue, but it intersects with the language specification if it has to say anything about these new types.

So, do they need to be mentioned in the language specification?

We could just add Int32, Uint8, etc., as new types in dart:typed_data, with no relation to each other, other than having toUint8(), toInt32() methods, and which are not subtypes of int. You'd have to call Int32(42) to create an instance from an integer (no literal syntax), but the compiler can definitely optimize that if it knows about the Int32 class.

So no, no need to be part of the language.

The things that would make it a language issue is:

  • Literals. Say you can write 255u8 to get an Uint8 with value 255, and 2i32 to get an Int32 with value 2. That would have to be in the language. But it's probably not particularly important.
  • Being a subtype of int. That's unlikely to happen, because int is already a type with values, and these values are not int values. (In particular Uint64 contains values that int doesn't.) And it would add special-casing to all the code that currently relies on int being a well known type.
  • They could be subtypes of num, but that would likely break a lot of code which assumes there can be only two: num x; if (x is double) ... else { x as int; ... }. With new Dart 3.0 sealed semantics, it'll also break switch (someNum) { int x => ..., double x => ...}.

So, if there should be any relation, we'd have to add yet-another-Number supertype of num, and add these new types below it. That doesn't need to be in the language semantics. We don't specify the supertypes of int and double there (like Comparable<num>.) It's also not particularly useful, so we might as well just not add a common supertype.

With no type-relation to int or num, and no literal syntax, this is purely a library request.

You can create those types today, but obviously not have them optimized as much as you'd probably want.

lrhn avatar Jul 06 '23 14:07 lrhn

Dart's int type has 53 bits of precision,

Only when compiled into javascript. On the other platforms including wasm, int has 64 bits precision.

Cat-sushi avatar Jul 06 '23 21:07 Cat-sushi

It's blowing my mind that this is a library issue, not a language issue, since ints are fundamental (primitive) to every language I have used before.

There must be situations where having native support in the language (including at the opcode level) for ints of different lengths will produce much faster code.

One unrelated strong argument for int64 / uint64 / int32 / uint32 etc. is that the semantics of these types will be the same on every platform -- which is not the case right now with dart's int. Your code can break only on the Javascript platform if you overflow the 53 available int bits on that platform, while the other platforms will continue to work fine.

lukehutch avatar Jul 07 '23 07:07 lukehutch

It's definitely going to be the same people making the decision, whether it's a library or language change, the only difference is whether it requires a language specification change.

Unless a single platform team decides to provide the special numbers only on their platform, then they can do that without needing everybody else to agree. (Well, maybe, if they can do it properly!)

Adding new types is not without cost to users. Today, if you return an int, everybody can receive that and use it with all their other ints.

If you start returning an Int32, which is not an int, the user has to either use other Int32s to work with it, or convert it to int, do computations on that, and maybe convert it back to Int32.

You can't do Int32 i = ...; Int64 j = ...; print(i + j);, it most likely won't type because + of Int32 is defined as Int32 operator+(Int32 other);.

Maybe we can get clever and introduce abstract supertypes, like;

class IntegerUpTo32 {}

implemented by Int8, Int16, Int32, Uint8 and Uint16, the types whose numerical values can safely be converted to signed 32-bit, and let Int32's + be

   Int32 operator+(IntegerUpTo32 other) => _add(this, other.toInt32());

But that also makes Int32.operator+ be less efficient, if it has to check the other incoming value's type. Better to just be Int32 operator+(Int32 other); and require the caller to do the .toInt32() as needed.

And then we end up with one operation returning an Int32, another returning an Uint32, and some poor third party code needing to combine those two in some way. Adding them safely would probably mean first.toInt64() + other.toInt64().

Lots of complexity at any point where you work with these new integer types, and only doing so in the hope that it might be more efficient. (If you tell people "using Int32 is more efficient than int", they'll start using Int32 everywhere, and it won't be more efficient, because they will be converting back and forth when int-based operations like list[index.toInt()] or Uri(..., port: num32.toInt()).)

Today's design is that a single value is always represented as an int, which is correct and very efficient up to some size (63 bits on native 64-bit, 31 bits on 32-bit, 53 bits on web). If you need multiple values of the same size, use typed-data lists, Uint8List(8). Those actually pack the values densely. There is no guarantee that a List<Uint32> will only use 32 bits for each entry on a 64-bit system, or on a 32-bit system for that matter, if it boxes each value. Changing the layout of the generic List implementation to depend on the type requires a kind of type-specialization that Dart doesn't do today. Same for having Int32 x; as a local variable, it may still be a full 64-bit slot on the stack, and may even be boxed and refer to a memory allocation containing, at least, two 64-bit words.

Dart is not C. You cannot control the memory layout. Using smaller integer values won't necessarily save space. It may save execution time, if a 32-bit multipliciation is cheaper than a 64-bit multiplication, but if that's the use-case, you really only need a int mulInt32(int a, int b); operation, not a type.

lrhn avatar Jul 07 '23 10:07 lrhn

I'm of the strong opinion that languages should never do automatic type coercion -- and I'm glad that Dart does almost none. Automatic type coercion usually only applies to widening types, but it has been the source of numerous bugs that I have found over the years, e.g. when integer math is expected but a floating point value is accidentally introduced in the middle of the expression.

Even with integer math, I don't believe automatically widening to the widest int in an expression is the way to go. All type coercion should be manual, so that you know what width you are working with in each sub-expression.

On one of your latter points -- to my knowledge, List<int> does not pack efficiently into (4 * list.length) bytes, either. So the expectation would not change for List<int64>, etc. (But then you would know you are getting 64-bits of precision on all platforms, including the Web...)

Additionally, Uint8List could be joined by cousins Uint16List, Uint32List, Uint64List, etc., which each had the same sort of memory-efficient implementation as Uint8List. And Uint8List could have a method .get(int index) added that returns actual uint8 values, rather than int values.

lukehutch avatar Jul 07 '23 13:07 lukehutch

Uint16List, etc., already exist. That's why I don't see much need for a standalone Uint16 type, because the case where you really need to care about memory layout, long repetitions, is already handled.

Doing precision preserving operations can be done directly on int, since it's large enough to hold anything except Uint64, perhaps with extension types for better ergonomics. There doesn't have to be any new value sizes for that.

lrhn avatar Jul 07 '23 21:07 lrhn

There are other arguments to be made in favor of supporting int types of a range of widths. I am currently working with the image library, which consists of pure Dart image processing code, and it is extremely slow (for cropping, image scaling, encoding/decoding to/from JPEG, etc.) -- much slower than a native implementation in C would be on the same architecture. Looking at the code, it looks to me like the culprit is the quirky way this sort of code has to be written to work with Dart numerical types, causing the compiler to generate inefficient code, because wider types are often needed than necessary; special handling is needed for 64-bit types; etc.

https://github.com/brendan-duncan/image

lukehutch avatar Aug 29 '23 10:08 lukehutch

I just looked at the source code of your image library, and can fully appreciate how difficult it is to do the necessary numeric calculations at the necessary precision. I have a similar situation with an audio library I'm writing, where the precision of the int and float datatypes are mandated by the audio format (just as yours are mandated by the image format), and so you sort of need to jump through hoops to implement the algorithm accurately for the required datatype.

It's true that Dart is not C, and we have FFI, but I would rather use Dart because I want the audio pipeline to be pluggable with user-defined dart functions, and it's just easier to manage that without going back and forth to C, plus I wouldn't expect user-defined functions in the pipeline to be written in C.

I expect and accept some tradeoff in performance for any language that's not C/Zig/Rust/..., but on the other hand, improvements to typed_data can't be a bad investment. Adding Int32, Int16, etc. sounds good. As does adding more complete SIMD support. When you look at the compiled code, you've got to think better is still possible in years to come, so I am fine accepting a performance tradeoff in Dart now while knowing that Dart can be made more efficient in the future. (And there are other ideas percolating such as #4271 which would also benefit me.)

P.S. currently my Dart code is around 2.7x slower than the same algorithm in C, but that's good enough for now because it's actually still 20x faster than the bundled native algorithm with Android.

ryanheise avatar Jul 31 '25 17:07 ryanheise

uint8 is very much so needed! Such a waste of space

Kemerd avatar Oct 01 '25 02:10 Kemerd

@Kemerd Check dart:typed_data and typed_data.

mateusfccp avatar Oct 02 '25 01:10 mateusfccp

@mateusfccp as already discussed in this thread, we have typed data, but we don't have is Uint8. So if you're dealing with a list of integers, you're covered. But if you're dealing with a list of objects with integer fields, you're not covered. @Kemerd is concerned about space, and that's a legitimate concern.

@lrhn wrote:

Dart is not C. You cannot control the memory layout. Using smaller integer values won't necessarily save space.

Although we're not limited to just these two extreme options of 1. you completely control the layout of an object (like C), or 2. you have no control over the layout.

For instance, it's totally fine for the Dart compiler to be somewhere in the middle, and to want to optimise the layout, and that could even involve reordering the fields. But surely adding types like Uint8 would HELP the compiler to optimise the layout and yes, save a lot of memory.

ryanheise avatar Oct 02 '25 02:10 ryanheise

For instance, it's totally fine for the Dart compiler to be somewhere in the middle, and to want to optimise the layout, and that could even involve reordering the fields. But surely adding types like Uint8 would HELP the compiler to optimise the layout and yes, save a lot of memory.

The VM would be unable to make use of this optimization without major changes. At the moment we have a convenient situation where pretty much every object takes the same amount of space in a class. Since every field is more-or-less just slot, we can essentially iterate through objects as if they're arrays. This is handy in the GC for example.

Also, ints are objects, so by default they're boxed (ie the slot has a reference to a separate int object, rather than storing the int inline). When they're small enough to fit in the slot without colliding with pointer tagging they're called SMIs (in a 64 bit build this is ints in the range -2^62 to 2^62 - 1), and we can inline them at runtime. Under the current architecture we'd still default to boxing the uint8s, so when you inline them you'd stick them in that 64-bit slot. You've still got a bunch of wasted space there, unless you change the memory layout of the object at runtime.

These are solvable problems, but they're a lot more work than they seem.

liamappelbe avatar Oct 02 '25 03:10 liamappelbe

These are solvable problems, but they're a lot more work than they seem.

That's fair, there will be priorities at play.

But Uint8 could still be introduced even if the compiler just stores it as a 64bit word in the object, and the compiler could progressively optimise it over time. Putting aside memory layout in objects, I would like to have Uint8 local variables for number crunching with the expected semantics for bitwise arithmetic, and without having to jump through hoops to emulate the desired semantics.

ryanheise avatar Oct 02 '25 04:10 ryanheise

I don't like the idea of relying on the compiler to optimize code generation for library-defined types (unless libraries can directly add code generation logic for their types). The compiler should be agnostic to library-defined functions and types. Ints and floats of different widths need to be core language types, not library-defined types.

lukehutch avatar Oct 02 '25 06:10 lukehutch

@liamappelbe

For instance, it's totally fine for the Dart compiler to be somewhere in the middle, and to want to optimise the layout, and that could even involve reordering the fields. But surely adding types like Uint8 would HELP the compiler to optimise the layout and yes, save a lot of memory.

The VM would be unable to make use of this optimization without major changes. At the moment we have a convenient situation where pretty much every object takes the same amount of space in a class. Since every field is more-or-less just slot, we can essentially iterate through objects as if they're arrays. This is handy in the GC for example.

That's not correct. We do in fact unbox double and int fields in objects. It's a waste of time and space to box them - especially in Dart 3 where we no longer need to worry about int being nullable, but we[^1] have originally implemented field unboxing based on global type inference, so it did not even need to rely on static types and NNDB. There are some limitations to it, but it works for most cases where it matters.

But surely adding types like Uint8 would HELP the compiler to optimise the layout and yes, save a lot of memory.

I am not sure uint8 are that pervasive to actually save a lot of memory. For it to actually matter you need to have a lot of smaller sized fields in an object - and you need to allocate a lot of objects of this type. Just replacing one int field with uint8 field does not save any memory because object sizes are aligned by 2 words anyway e.g.

class A1 { int x; }
class A2 { int8 y; }

Would occupy exactly the same amount of memory. You need something like:

class A1 { int x; int y; int z; } // 32 bytes
class A2 { uint8 y; uint8 y; uint16 z; } // 16 bytes  

To actually benefit.

Strictly speaking you can just do:

class A3 {
  int _storage;
  int get x => _storage & 0xFF;
  int get y => (_storage >> 8) & 0xFF;
  int get z => (_storage >> 16) & 0xFFFF;
}
// Also 16 bytes

If you really have the need to pack things up.

[^1]: It was an intern project hosted by @mkustermann and Victor has written a bachelor thesis about it, so you can read how it works.

mraleph avatar Oct 02 '25 07:10 mraleph

I am not sure uint8 are that pervasive to actually save a lot of memory.

Various applications would have objects containing either low precision numbers or flag bitfields. For example, objects in some sort of 3D rendering engine or game, or structures used to implement a database, etc. I think there are many applications that have lots of data where optimising memory usage becomes a concern.

ryanheise avatar Oct 02 '25 08:10 ryanheise

If you only have one uint8 in your class, it won't change anything. All the other fields, and the object, will be 8-byte aligned anyway.

You need to have multiple smaller values that can be packed (aligned) into an 8-byte slot before you save anything. And then you could just store those in an int yourself. That's obviously not as convenient.

Or we could introduce bit-fields as a language feature:

class C {
   int {
     bool isValid: 1 = false;
     bool isUpdated: 1 = false;
     int rwx: 3 = 0;
     int counter: ... = 0;
  }

That would introduce one int and let you store booleans (one bit) or integers (a number of bits) inside it, in an implementation dependent way.

(But it's also not that hard to do that manually, I've done it many times.)

lrhn avatar Oct 02 '25 15:10 lrhn

If you only have one uint8 in your class, it won't change anything.

I know, but I provided examples, e.g. low resolution x + y coordinate + flags.

But it's also not that hard to do that manually, I've done it many times.

Maybe, but your example doesn't show real-world use, where those ints need to be manipulated in (for example) 8-bit unsigned semantics. I can't count the number of times I've introduced a bug trying to emulate bitwise arithmetic for an 8-bit value (signed or unsigned) inside a signed 64 bit int. When I finally track down the bug and fix it, the code ends up correct but overly complicated and inefficient due to the extra operations to handle masking and the sign bit.

ryanheise avatar Oct 02 '25 17:10 ryanheise

I can't count the number of times I've introduced a bug trying to emulate bitwise arithmetic for an 8-bit value (signed or unsigned) inside a signed 64 bit int.

I agree with this... It is much better from a code complexity, readability, and efficiency point of view to let the compiler handle these sorts of things. Code that is full of masks and shifts and Boolean operations just to pack/unpack bytes from words is always a pain to write and to get right.

My main concern is that I want to be able to quickly process binary data such as pixels, without having to worry about things like converting between Uint8List and List<int> etc., which can introduce massive inefficiencies.

The Dart image library is horrendously slow, despite being highly optimized. Take a look at the code there for an idea of the types of operations that need to be highly optimizable by the compiler. You will also see a ton of ugly hacks in there that the author had to implement to work around the utterly weird way that numbers are implemented in Dart.

I would love to see Dart throw away its Javascript legacy at this point (specifically the number types, and isolates), and start working more like a bare-metal language.

lukehutch avatar Oct 02 '25 17:10 lukehutch