pony-tutorial icon indicating copy to clipboard operation
pony-tutorial copied to clipboard

The description of the behaviour of "default" and unsafe arithmetic is unclear, especially the term wrap around

Open fdsc opened this issue 5 years ago • 11 comments
trafficstars

In https://tutorial.ponylang.io/expressions/arithmetic.html

  • | add() | wrap around on over-/underflow +~ | add_unsafe() | Overflow E.g. I32.max_value() +~ I32(1)

But I see var x10 = U32.max_value() + 1 env.out.print("U32.max_value() + 1: " + x10.string()) // print 0

x10 = U32.max_value() +~ U32(1) env.out.print("U32.max_value() +~ 1: " + x10.string()) // 4294967295

That is, "+~" is performed without overflow, but "+" - with overflow

ponyc -v 0.33.2-aff95ec [release] (windows 7 x64)

fdsc avatar Feb 28 '20 17:02 fdsc

You seem to have an expectation of what should be happening? +~ over the max in your example is undefined behavior.

The current description says that the result of an overflow is undefined. That is what you are seeing. +~ isn't performed without overflow, it is undefined what will happen when it overflows. It can vary by platform, including overflowing as you might expect and wrapping around.

What your snippet from the docs above doesn't include is:

"Undefined in case of" before "Overflow E.g. I32.max_value() +~ I32(1)" so the full statement is:

"Undefined in case of Overflow E.g. I32.max_value() +~ I32(1)"

SeanTAllen avatar Feb 28 '20 17:02 SeanTAllen

U32.max_value() + 1 == 0 is a native result for the Intel x64 Overflow occured at safe operation. Incorrect operation.


BUT U32.max_value() +~ 1

The unsafe fast operation by tutorial ended without a native result The unsafe operation makes no overflow U32.max_value() +~ U32(1) must equal "0" for the Intel x64 (but 4294967295)

Don't you find it strange? The unsafe operation ends safely, and the safe operation ends with an overflow?

fdsc avatar Feb 28 '20 17:02 fdsc

  • is not safe in the way you are using it. it is merely predictable. +~ is unsafe because it is undefined what will happen, not because you will get a wrong result. +? is probably what you mean by "safe".

As detailed later in that page +? will error on overflow or underflow.

Please note nothing in the page says that "+" is safe. It says it is "default arithmetic".

SeanTAllen avatar Feb 28 '20 18:02 SeanTAllen

Please reread what I wrote above before reading below (yes, safe is "default arithmetic" ) I assume also that you know the assembler for the Intel x64 (x86-x64; AMD64 and other names) architecture


Operator | Method | Description

  • | add() | wrap around on over-/underflow

It says here that this operator has a wrapper. "tries to balance the needs for performance and correctness"

That is, this operator is safe in the sense that it has a wrapper. That is, it should not just the simple overflow

But. Something like this happens mov rax, 4294967295 add rax, 1

That's all (I didn't look in the debugger to see if it looks like this)

This is the first contradiction


For "+~" "No checks, raw speed, possibilities of overflow, underflow or division by zero"

This assumes just the native addition operation What we saw above

However, instead we see the result of something like this mov rax, 4294967295 mov rdx, rax add rax, 1 jnc success

mov rax, rdx success:

So here we see traces of the wrapper. But there should be no wrappers. This is raw speed! But the wrapper is there, although it should not be


Try it on your version of ponyc, will you have this behavior? U32.max_value() +~ U32(1) == ??? U32.max_value() + U32(1) == ???

fdsc avatar Feb 28 '20 18:02 fdsc

I see a source of confusion. In the case of "wrap around on over-/underflow"; we will need to clarify that. "wrap" there means literally if you consider the representable range of numbers to be circular, it will wrap around, that is: MAX + 1 = the lowest value representable for the type.

It doesn't mean there is some wrapper the current description led you to believe.

The description means the opposite of what you are taking it to mean. The intent is to convey that it will wrap on overflow or underflow, not that there is a wrapper. It will do as you put it "a simple underflow".


+~ makes no guarantees at all. It is dependent on LLVM functions that declare themselves as undefined behavior in the case of overflow, underflow, etc. Those LLVM functions are the "unsafe" variants that it provides.


I can also see how the description of unsafe might have lead to your conclusions. @mfelsche, I think part of the issue here is that the description of unsafe makes it sound like default is doing something different and that isn't necessarily the case. But the presentation suggests that.

Really, we want to be contrasting checked/partial with unsafe. But the current reading makes unsafe sound like it is directly contrasted against default and therefore default must be doing something safe.

I think we need to rewrite.

SeanTAllen avatar Feb 28 '20 18:02 SeanTAllen

This would be possible if not for the behavior of the operation "+~"

The problem is unsafe "+~" it should not contain any wrappers. It should make the fine machine operation of addition.

But "+~" doesn't do it. Instead, it does what I wrote above

fdsc avatar Feb 28 '20 18:02 fdsc

I have the impression that there are two problems here


  1. The compiler or LLVM does not work correctly under Windows or at all

Since it is obvious that "+~" should give the machine addition command. But does not give (question, only if I have?)


  1. There is no clear explanation in the tutorial text

Perhaps it is better to say that the default operations give the same thing on different architectures And this is similar to the behavior of Intel x64

This is especially true for the "neg" command. Not a wrap around. neg(-128) == -128 for U8

fdsc avatar Feb 28 '20 18:02 fdsc

So, if you look into the generated IR (obtained from compiling some pony code with ponyc --debug --pass=ir .) it is clear that:

  • + creates a plain LLVM add instruction https://releases.llvm.org/9.0.0/docs/LangRef.html#add-instruction
    • thus it is wrapping around:

If the sum has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.

  • +~ creates an LLVM add instruction with the nuw flag (No unsigned wrap)
    • this will result in LLVM emitting a poison value on overflow, leading to undefined behaviour as the tutorial and Sean explained.

I am not an expert on x86 assembler and i am unable to make sense of the assembler emitted when calling ponyc as ponyc --pass=asm . as the debug info is stripped. Beware that the LLVM passes we use will do several optimizations on the generated IR, including inlining constant folding etc etc. especially in case of undefined behaviour, the results might be surprising.

I would say that if you think that ponyc is not emitting correct assembler, please investigate further on your machine, look into the generated assembler, and create an issue on the ponyc repo ( https://github.com/ponylang/ponyc/issues ).

I am closing this issue as this is not a matter of the tutorial imho.

mfelsche avatar Feb 28 '20 19:02 mfelsche

I think this is a textbook question too. I agree with what SeanTAllen indicated above.

The words about security compromise are not clear. They're confusing. The words "wrap around" are also very confusing. In LLVM, in this sense, everything is described correctly. In addition, the words "wrap around" are clearly incorrect when describing the "neg" operation"

fdsc avatar Feb 28 '20 20:02 fdsc

Ok, if we describe this issue as: "The description of the behaviour of "default" arithmetic is unclear, especially the term wrap around." This issue makes sense here, and I agree with your point. Does that match with what you have in mind?

Please feel free to edit the title accordingly.

mfelsche avatar Feb 28 '20 20:02 mfelsche

Yes

  1. It would be nice to add that we are talking about adding by module.
  2. Complement the describe of the concept more clearly, what is the difference in the default concept compared to unsafe

fdsc avatar Feb 28 '20 20:02 fdsc