pony-tutorial
pony-tutorial copied to clipboard
The description of the behaviour of "default" and unsafe arithmetic is unclear, especially the term wrap around
In https://tutorial.ponylang.io/expressions/arithmetic.html
- | add() | wrap around on over-/underflow +~ | add_unsafe() | Overflow E.g. I32.max_value() +~ I32(1)
But I see var x10 = U32.max_value() + 1 env.out.print("U32.max_value() + 1: " + x10.string()) // print 0
x10 = U32.max_value() +~ U32(1) env.out.print("U32.max_value() +~ 1: " + x10.string()) // 4294967295
That is, "+~" is performed without overflow, but "+" - with overflow
ponyc -v 0.33.2-aff95ec [release] (windows 7 x64)
You seem to have an expectation of what should be happening? +~ over the max in your example is undefined behavior.
The current description says that the result of an overflow is undefined. That is what you are seeing. +~ isn't performed without overflow, it is undefined what will happen when it overflows. It can vary by platform, including overflowing as you might expect and wrapping around.
What your snippet from the docs above doesn't include is:
"Undefined in case of" before "Overflow E.g. I32.max_value() +~ I32(1)" so the full statement is:
"Undefined in case of Overflow E.g. I32.max_value() +~ I32(1)"
U32.max_value() + 1 == 0 is a native result for the Intel x64 Overflow occured at safe operation. Incorrect operation.
BUT U32.max_value() +~ 1
The unsafe fast operation by tutorial ended without a native result The unsafe operation makes no overflow U32.max_value() +~ U32(1) must equal "0" for the Intel x64 (but 4294967295)
Don't you find it strange? The unsafe operation ends safely, and the safe operation ends with an overflow?
- is not safe in the way you are using it. it is merely predictable. +~ is unsafe because it is undefined what will happen, not because you will get a wrong result. +? is probably what you mean by "safe".
As detailed later in that page +? will error on overflow or underflow.
Please note nothing in the page says that "+" is safe. It says it is "default arithmetic".
Please reread what I wrote above before reading below (yes, safe is "default arithmetic" ) I assume also that you know the assembler for the Intel x64 (x86-x64; AMD64 and other names) architecture
Operator | Method | Description
- | add() | wrap around on over-/underflow
It says here that this operator has a wrapper. "tries to balance the needs for performance and correctness"
That is, this operator is safe in the sense that it has a wrapper. That is, it should not just the simple overflow
But. Something like this happens mov rax, 4294967295 add rax, 1
That's all (I didn't look in the debugger to see if it looks like this)
This is the first contradiction
For "+~" "No checks, raw speed, possibilities of overflow, underflow or division by zero"
This assumes just the native addition operation What we saw above
However, instead we see the result of something like this mov rax, 4294967295 mov rdx, rax add rax, 1 jnc success
mov rax, rdx success:
So here we see traces of the wrapper. But there should be no wrappers. This is raw speed! But the wrapper is there, although it should not be
Try it on your version of ponyc, will you have this behavior? U32.max_value() +~ U32(1) == ??? U32.max_value() + U32(1) == ???
I see a source of confusion. In the case of "wrap around on over-/underflow"; we will need to clarify that. "wrap" there means literally if you consider the representable range of numbers to be circular, it will wrap around, that is: MAX + 1 = the lowest value representable for the type.
It doesn't mean there is some wrapper the current description led you to believe.
The description means the opposite of what you are taking it to mean. The intent is to convey that it will wrap on overflow or underflow, not that there is a wrapper. It will do as you put it "a simple underflow".
+~ makes no guarantees at all. It is dependent on LLVM functions that declare themselves as undefined behavior in the case of overflow, underflow, etc. Those LLVM functions are the "unsafe" variants that it provides.
I can also see how the description of unsafe might have lead to your conclusions. @mfelsche, I think part of the issue here is that the description of unsafe makes it sound like default is doing something different and that isn't necessarily the case. But the presentation suggests that.
Really, we want to be contrasting checked/partial with unsafe. But the current reading makes unsafe sound like it is directly contrasted against default and therefore default must be doing something safe.
I think we need to rewrite.
This would be possible if not for the behavior of the operation "+~"
The problem is unsafe "+~" it should not contain any wrappers. It should make the fine machine operation of addition.
But "+~" doesn't do it. Instead, it does what I wrote above
I have the impression that there are two problems here
- The compiler or LLVM does not work correctly under Windows or at all
Since it is obvious that "+~" should give the machine addition command. But does not give (question, only if I have?)
- There is no clear explanation in the tutorial text
Perhaps it is better to say that the default operations give the same thing on different architectures And this is similar to the behavior of Intel x64
This is especially true for the "neg" command. Not a wrap around. neg(-128) == -128 for U8
So, if you look into the generated IR (obtained from compiling some pony code with ponyc --debug --pass=ir .) it is clear that:
+creates a plain LLVMaddinstruction https://releases.llvm.org/9.0.0/docs/LangRef.html#add-instruction- thus it is wrapping around:
If the sum has unsigned overflow, the result returned is the mathematical result modulo 2n, where n is the bit width of the result.
+~creates an LLVMaddinstruction with thenuwflag (No unsigned wrap)- this will result in LLVM emitting a
poison valueon overflow, leading to undefined behaviour as the tutorial and Sean explained.
- this will result in LLVM emitting a
I am not an expert on x86 assembler and i am unable to make sense of the assembler emitted when calling ponyc as ponyc --pass=asm . as the debug info is stripped. Beware that the LLVM passes we use will do several optimizations on the generated IR, including inlining constant folding etc etc. especially in case of undefined behaviour, the results might be surprising.
I would say that if you think that ponyc is not emitting correct assembler, please investigate further on your machine, look into the generated assembler, and create an issue on the ponyc repo ( https://github.com/ponylang/ponyc/issues ).
I am closing this issue as this is not a matter of the tutorial imho.
I think this is a textbook question too. I agree with what SeanTAllen indicated above.
The words about security compromise are not clear. They're confusing. The words "wrap around" are also very confusing. In LLVM, in this sense, everything is described correctly. In addition, the words "wrap around" are clearly incorrect when describing the "neg" operation"
Ok, if we describe this issue as:
"The description of the behaviour of "default" arithmetic is unclear, especially the term wrap around."
This issue makes sense here, and I agree with your point. Does that match with what you have in mind?
Please feel free to edit the title accordingly.
Yes
- It would be nice to add that we are talking about adding by module.
- Complement the describe of the concept more clearly, what is the difference in the default concept compared to unsafe