Support i64 emulation in the SPIR-V backend.

Open benvanik opened this issue 3 years ago • 1 comments

More users are hitting issues with the i64->i32 truncation and some actually need the i64 values even when running on 32-bit devices. Unfortunately these i64 types aren't always coming from users but being inserted by things like JAX and pytorch and we can't (easily) make them change. Currently SPIR-V is the last thing requiring this truncation and if we were able to support i64 there we could remove it entirely and never get another issue about it. It doesn't mean i64 has to be fast on 32-bit hardware - just correct - as then if there are performance issues we can redirect that to the frontends :)

Today this test is only passing because we truncate: https://github.com/google/iree/blob/9240f1c789f7f3ac2d62701ef1c6c9856587c509/iree/test/e2e/vulkan_specific/compare.mlir#L56-L65

Removing the truncation by commenting out this block will cause it to fail to compile: https://github.com/google/iree/blob/4a93d2569e44f53b19921e47840cf8576f87bf1d/iree/compiler/Dialect/Flow/Transforms/TypeConverter.cpp#L37-L44 ->

iree/test/e2e/vulkan_specific/compare.mlir:62:13: error: failed to legalize operation 'memref.load'
  %result = "mhlo.compare"(%lhs, %rhs) {comparison_direction = #mhlo<"comparison_direction EQ">} : (tensor<i64>, tensor<i64>) -> tensor<i1>

Seems to call for a pass that takes IR using i64 and expands loads/stores/arith to emulate i32. That'd be useful for other platforms too and likely something that we can just do on standard dialects (arith/memref/etc) - shouldn't be too bad, just a bunch of 1990's bit twiddling: https://gist.github.com/csaftoiu/9194ef9ffd98b7b106b359ca55557010

CUDA unconditionally supports i64 and we already get this for free on the CPU path because LLVM does it for us, but if we did it in MLIR we could do it early in backend translation and have tile/distribute/vectorize/etc chew on everything really nicely vs doing it at the SPIR-V level.

As a workaround that would allow us to remove the truncation by default we can run the current truncation pass by flag only if targeting Vulkan and using i64 tensors, though that's a big divergence and bound to cause hard to debug issues.

Note that this is just i64 - f64 is fine to push back onto frontends.

Part of #8661.

Mar 31 '22 01:03 benvanik

Android i64 support is nearly non-existent (just some Tegra-like things, probably) and interestingly pre-M1 macOS never supported it (but all recent iOS and M1 macs do).

Mar 31 '22 01:03 benvanik