Integer ideas

Open plajjan opened this issue 2 years ago • 1 comments

I think we have the basic integer types we want now:

int, which is a arbitrary sized integer
i64, i32, i16 are fixed sized integers
u64, u32, u16 are fixed sized unsigned integers

Integer literals will be int. Built-in functions like range() or print("something %d") accept int arguments as things currently stand.

Note how this issue is deliberately not about whether int should stand for something else, like even if we change so that integer literals default to being i64, this issue presumes that the current names are used.

We think we'd like to improve things a bit around integer handling through the following goals:

it is a good thing that a novice programmer, that have no knowledge of types, can perform arbitrary precision math in acton
- with integer literals defaulting to int, this is possible
- if we change integer literals to default to something else, one would have to explicitly specify type int
- an alternative for same level of "ease of use" would be to default to like i64 and have overflow detection, so it's a runtime exception that can tell the user to change to int
  - but this is hard because we likely want a ReleaseFast optimization that does not do integer overflow detection
it should be possible to write faster programs by using fixed size integer types
- someone with a bit of understanding of computers, memory and types should be able to judge "this program is fine running with an u64 instead of arbitrary sized int, provide the necessary types and the program should run faster
  - one should NOT have to change function calls, like if a range() is used, we should not have to call a separate range_i64() or similar
  - preferably run with no run time overhead, through like monomorphization or so
- in the future we will also be able to unbox fixed size integers, which won't be the case for arbitrary sized int

Some ideas for a solution

generic range() & friends
some form of relationship between int and i64 & friends
does type inference differentiate between explicitly typed stuff and inferred types?

This is what the naive program would look like today:

def foo():
    a = 123
    for i in range(0, a, 1):
        print("i: %d  a: %d" % (a, i))

In this program, a is a int and like today range() accepts int args. Similarly, print() format char %d takes int input.

Then we want our program to run faster, so we do:

def foo():
    a: u64 = 123
    for i in range(0, a, 1):
        print("i: %d  a: %d" % (a, i))

As a is now a u64. Let's say range() is actually a generic function range(A, A, A) -> A and through monomorphization we produce one for int, another for u64 etc etc. Since a is a explicitly typed as u64 and the other arguments are non-explicitly typed integer literals, we can infer that they too should be u64. Without any type inference saying otherwise, all integer literals are assumed to be the int type.

print("%d") should accept any int type. This feels like a fairly no-brainers kind of thing, couldn't we do this right away without changing anything else? I guess it means a bunch of isinstance checks internally or so?

A consequence of this is that any explicit type overrides others and affects the inference of types for related or dependent variables. But this is fine, right? It's possible that we write a program where we rely on the default of integer literals being int, but then make a call to a library that only works with u64 and as it has explicit type signatures, now one of its return values will "poison" our program and propagate the u64 type everywhere. Is that ultrabad? That's just type inference in general, right?

import mathu64

def foo():
    a = 123 + mathu64.funfunc(456)
    for i in range(0, a, 1):
        print("i: %d  a: %d" % (a, i))

Here we intend to have a be a arbitrary sized int, i.e. int, but as we make use of the mathu64 library which only accepts and returns u64 values, and that is explicitly specified in its type signatures, a becomes a u64 and thus in the call to range, the other integer literal arguments are also treated as u64. I suppose this is acceptable, right? One could fix it by explicitly setting the type of a to int.

Jun 18 '23 23:06 plajjan

I agree with the goals and the problem description, with some comments inlined.

-- Johan

On 19 Jun 2023, at 01:08, Kristian Larsson @.***> wrote:

I think we have the basic integer types we want now: • int, which is a arbitrary sized integer • i64, i32, i16 are fixed sized integers • u64, u32, u16 are fixed sized unsigned integers Integer literals will be int.

That's not quite true. Currently we're overloading the integer literals so that each occurrence can stand for any of the types above, and float as well.

But I think it's becoming obvious that, with all the new integer types, subtyping is a better way to achieve literal flexibility. More on this below.

Built-in functions like range() or print("something %d") accept int arguments as things currently stand. Note how this issue is deliberately not about whether int should stand for something else, like even if we change so that integer literals default to being i64, this issue presumes that the current names are used. We think we'd like to improve things a bit around integer handling through the following goals: • it is a good thing that a novice programmer, that have no knowledge of types, can perform arbitrary precision math in acton • with integer literals defaulting to int, this is possible • if we change integer literals to default to something else, one would have to explicitly specify type int • an alternative for same level of "ease of use" would be to default to like i64 and have overflow detection, so it's a runtime exception that can tell the user to change to int • but this is hard because we likely want a ReleaseFast optimization that does not do integer overflow detection • it should be possible to write faster programs by using fixed size integer types • someone with a bit of understanding of computers, memory and types should be able to judge "this program is fine running with an u64 instead of arbitrary sized int, provide the necessary types and the program should run faster • one should NOT have to change function calls, like if a range() is used, we should not have to call a separate range_i64() or similar • preferably run with no run time overhead, through like monomorphization or so • in the future we will also be able to unbox fixed size integers, which won't be the case for arbitrary sized int Some ideas for a solution • generic range() & friends

Yes! But just a remark: doing this in isolation will currently just make some programs slower. It needs to be combined with the inlining and compile-time reduction of witnesses we've also talked about to be effective.

• some form of relationship between int and i64 & friends

Here I propose switching to subtyping, and to remove the literal overloading. Currently the type system actually allows the literal 1000 to be assigned any of our integer types as well as float, which of course is wrong for i8 and u8. Better then to give each literal the smallest enclosing integer type, and rely on subtyping to make them fit their context.

(Side remark: We'll probably have to introduce the "helper" type u7 to address the fact that i8 and u8 have values in common even though they are not subtype related. Otherwise we would have no smallest type to give to literals in the range 0..127. Similar for u15 and u31.)

(Side remark 2: Or we could actually repeat the approach we take for string literals and give each literal their own singleton type! So 42 would have type 42, which is its smallest possible type. But via subtyping it also has types i8, u8, i15, etc.)

• does type inference differentiate between explicitly typed stuff and inferred types?

Only when the solver takes defaulting decisions. But with an explicit signature the solver typically doesn't have to.

This is what the naive program would look like today: def foo(): a = 123 for i in range(0, a, 1): print("i: %d a: %d" % (a, i)) In this program, a is a int and like today range() accepts int args. Similarly, print() format char %d takes int input. Then we want our program to run faster, so we do: def foo(): a: u64 = 123 for i in range(0, a, 1): print("i: %d a: %d" % (a, i)) As a is now a u64. Let's say range() is actually a generic function range(A, A, A) -> A

I think the complete qualified type should be

range : [A from int] => (A,A,A) -> A

and through monomorphization we produce one for int, another for u64 etc etc. Since a is a explicitly typed as u64 and the other arguments are non-explicitly typed integer literals, we can infer that they too should be u64. Without any type inference saying otherwise, all integer literals are assumed to be the int type.

Not quite, they are assumed to be of the integer type the context expects. Which is a bit too liberal nowadays, but fixable (see above).

print("%d") should accept any int type. This feels like a fairly no-brainers kind of thing, couldn't we do this right away without changing anything else? I guess it means a bunch of isinstance checks internally or so?

Well, currently we have no way of expressing "any int type" in the type system! But once we switch to subtype-related integers, we will.

A consequence of this is that any explicit type overrides others and affects the inference of types for related or dependent variables. But this is fine, right? It's possible that we write a program where we rely on the default of integer literals being int, but then make a call to a library that only works with u64 and as it has explicit type signatures, now one of its return values will "poison" our program and propagate the u64 type everywhere. Is that ultrabad? That's just type inference in general, right?

Indeed!

import mathu64

def foo(): a = 123 + mathu64.funfunc(456) for i in range(0, a, 1): print("i: %d a: %d" % (a, i)) Here we intend to have a be a arbitrary sized int, i.e. int, but as we make use of the mathu64 library which only accepts and returns u64 values, and that is explicitly specified in its type signatures, a becomes a u64 and thus in the call to range, the other integer literal arguments are also treated as u64. I suppose this is acceptable, right? One could fix it by explicitly setting the type of a to int.

The story is like this: funfunc return a u64 and 123 can be a u64, so + must produce a u64 and hence a is a u64. And since 0 and 1 can also be of type u64, the result of range, bound to i, is a u64.

So your intuition is correct, I'm just adding that each integer literal is treated in isolation when the types are inferred.

-- Johan

Jun 21 '23 10:06 nordlander