onnx-mlir Passing int32 parameters to external function calls at the LLVM level causes wrong generated code on zLinux

Note: This error does not happen with the current master branch of onnx-mlir which is currently passing int64 instead of int32 parameters to external functional calls.

This issue seems to be a limitation in MLIR where integers are signless and there is no way so far to annotate their signness so that lower-level tools like opt/llc can use to correctly generate assembly code.

Detail discussion about this error is in this PR #567

Jul 15 '21 02:07 tungld

I have long advocating only using 64 bit integers. There are no reasons to limit ourselves to 32 bits, and that way this entire problem is avoided. I would change all int parameters in the RT to 64 bit and be done with it. The sooner the better.

Jul 20 '21 15:07 AlexandreEichenberger

I still maintain the position that the type of the variables should be determined by the semantic of the variables, not because using a particular type can seemingly get around some problem that should have been fixed. We can use 64 bit integers in place of all 32 bit integers, but what about short, char, and boolean type parameters? We can't replace them all with int64_t. The fundamental problem is the zeroext/signext marker not being generated, changing all int32_t to int64_t is only a partial workaround.

Jul 20 '21 15:07 gongsu832

@gongsu832 @AlexandreEichenberger I agree with Alex. Any scalar quantity that is used as an integer should be the largest available size. This is the philosophy we used in our compilers for the last 30 to 40 years. I agree that if someone uses the shorter types, MLIR should be fixed to handle the correct extension, but that is an orthogonal concern in my opinion.

Jul 20 '21 21:07 caoimhinuibrian

@caoimhinuibrian So I'm curious about the rationale behind this philosophy. As a systems person, this is rather counter intuitive to me. The largest available size obviously depends on the platform, some int32, some int64, and some int128. Does this philosophy apply to floating point as well?

Jul 21 '21 00:07 gongsu832

@gongsu832 integers are represented by bit patterns that follow 2's complement arithmetic, so other than where they wrap (modulo), the answers are the same for different sizes. Floating point is completely different. Operating on shorter integers is often more expensive (masking operations etc) so from a compiler perspective it's usually best to just go with whatever the register size is on the machine. Today, I think all machines of interest have 64 bit regs. If there is a specific case where the smaller size gives an advantage, then we could special case it, but in general its a waste of intellectual effort to keep track of all these sizes if there is no compelling reason to do so.

Jul 21 '21 16:07 caoimhinuibrian

@caoimhinuibrian Thanks for the clarification. If I understand you correctly, this "using int64 for all integers" (assuming 64 bit registers) applies to just the "internal" lowering passes, correct? i.e., code written in high level languages can obviously use whatever data types there are like char, short, int, etc. But the compiler internally could represent all of them with int64 in various passes and only at the last moment when it needs to generate machine instructions it could then apply the correct original type to make sure the right instructions generated. Is that sort of the gist of how it works? Thanks.

Jul 22 '21 00:07 gongsu832

The same error was reported here in LLVM bugs: https://bugs.llvm.org/show_bug.cgi?id=51898

It seems there is no solution so far except using int64 as we are doing.

Sep 30 '21 04:09 tungld

@tungld is this issue resolved? Has the transition to int64 been done (if that is the preferred solution)

Nov 03 '21 20:11 AlexandreEichenberger

We are using int64 at this moment and there is no problem in onnx-mlir. However, we still want to keep this opened until MLIR can correctly generate Z instructions annotated with zeroext/signext.

Nov 04 '21 01:11 tungld

onnx-mlir onnx-mlir copied to clipboard

Passing int32 parameters to external function calls at the LLVM level causes wrong generated code on zLinux

onnx-mlir
onnx-mlir copied to clipboard