pytorch TensorExpr eval: fix copying variables from pointers on big endian systems

When copying data from pointers, only lowest bytes are copied. On little endian systems they are located at the beginning of pointer. On big endian systems they are located at the end of pointer.

This change fixes TestTensorExprPyBind::test_dynamic_shape and TestTensorExprPyBind::test_dynamic_shape_2d tests from test/test_tensorexpr_pybind.py on big endian systems.

cc @EikanWang @jgong5

Mar 16 '23 13:03 AlekseiNikiforovIBM

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/96951

:page_facing_up: Preview Python docs built from this PR
:page_facing_up: Preview C++ docs built from this PR
:question: Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

:white_check_mark: No Failures

As of commit 256ded05e4a2c3ce542af582b018d065dc5e9bf0: :green_heart: Looks good so far! There are no failures yet. :green_heart:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Mar 16 '23 13:03 pytorch-bot[bot]

@pytorchbot merge

Mar 16 '23 16:03 ezyang

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status here

Mar 16 '23 16:03 pytorchmergebot

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Mar 16 '23 16:03 pytorchmergebot

@pytorchbot merge

Mar 17 '23 14:03 ezyang

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status here

Mar 17 '23 14:03 pytorchmergebot

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Lint / lintrunner / linux-job

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Mar 17 '23 14:03 pytorchmergebot

need lintfix

Mar 17 '23 14:03 ezyang

There's one more issue:

2023-03-17T14:46:37.7006716Z /opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -Iaten/src -I../../aten/src -I. -I../../ -I../../caffe2/aten/src/TH -Icaffe2/aten/src/TH -Icaffe2/aten/src -Icaffe2/../aten/src -I../../torch/csrc -I../../third_party/miniz-2.1.0 -I../../third_party/kineto/libkineto/include -Ivulkan -I../../aten/../third_party/VulkanMemoryAllocator -I../../aten/src/ATen/.. -I../../third_party/FXdiv/include -I../../c10/.. -I../../third_party/pthreadpool/include -I../../third_party/cpuinfo/include -I../../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../../third_party/cpuinfo/deps/clog/include -I../../third_party/NNPACK/include -I../../. -I/opt/ndk/sources/third_party/vulkan/src/include -I../../third_party/FP16/include -I../../third_party/fmt/include -I../../third_party/flatbuffers/include -isystem ../../third_party/XNNPACK/include -isystem /opt/ndk/sources/third_party/vulkan/src/common -isystem ../../cmake/../third_party/eigen -isystem ../../caffe2 -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -mfpu=vfpv3-d16 -fno-addrsig -march=armv7-a -mthumb -Wa,--noexecstack -Wformat -Werror=format-security -stdlib=libc++ -frtti -fexceptions  -ffunction-sections -fdata-sections -DNO_EXPORT -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_VULKAN_WRAPPER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN -DUSE_VULKAN_API -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wvla-extension -Wno-range-loop-analysis -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-missing-braces -Wunused-lambda-capture -Qunused-arguments -fcolor-diagnostics -fdiagnostics-color=always -fno-math-errno -fno-trapping-math -Werror=format -g0 -Oz -DNDEBUG  -fPIC -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-sign-compare -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-missing-braces -Wno-range-loop-analysis -fvisibility=hidden -O2 -pthread -std=gnu++17 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o -c ../../torch/csrc/jit/tensorexpr/eval.cpp
2023-03-17T14:46:37.7013943Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1303:55: error: static_assert failed "Can't read data bigger than sizeof(void*) from pointer"
2023-03-17T14:46:37.7117196Z     AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE);
2023-03-17T14:46:37.7118550Z     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
2023-03-17T14:46:37.7119211Z ../../c10/core/ScalarType.h:190:3: note: expanded from macro 'AT_FORALL_SCALAR_TYPES_AND3'
2023-03-17T14:46:37.7119677Z   _(int64_t, Long)                                                            \
2023-03-17T14:46:37.7120030Z   ^~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7120585Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1284:5: note: expanded from macro 'TYPE_CASE'
2023-03-17T14:46:37.7121076Z     static_assert(sizeof(Type) <= sizeof(void*),                  \
2023-03-17T14:46:37.7121454Z     ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7122140Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1303:55: error: static_assert failed "Can't read data bigger than sizeof(void*) from pointer"
2023-03-17T14:46:37.7123020Z     AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE);
2023-03-17T14:46:37.7123430Z     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
2023-03-17T14:46:37.7123990Z ../../c10/core/ScalarType.h:192:3: note: expanded from macro 'AT_FORALL_SCALAR_TYPES_AND3'
2023-03-17T14:46:37.7124461Z   _(double, Double)                                                           \
2023-03-17T14:46:37.7124788Z   ^~~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7125338Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1284:5: note: expanded from macro 'TYPE_CASE'
2023-03-17T14:46:37.7157284Z     static_assert(sizeof(Type) <= sizeof(void*),                  \
2023-03-17T14:46:37.7157715Z     ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It looks like static assert fails on one of platforms for "double" type. As far as I'm aware, only "int" type is saved as pointer now, but I might be incorrect:

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833

Maybe, "double" type should be excluded here? Or only int types should remain?

Mar 17 '23 15:03 AlekseiNikiforovIBM

@pytorchbot merge

Mar 17 '23 15:03 ezyang

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status here

Mar 17 '23 15:03 pytorchmergebot

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

pull / linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single-full-jit / build-and-test (default, 1, 1, linux.2xlarge)

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

Mar 17 '23 15:03 pytorchmergebot

@EikanWang do you know?

Mar 17 '23 19:03 ezyang

Maybe, "double" type should be excluded here? Or only int types should remain?

TE supports most torch datatypes including double, and SimpleIREvaluator is a codegen of TE. Hence, we should not just support int types.

Mar 18 '23 02:03 EikanWang

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833

@AlekseiNikiforovIBM , This is a python interface that intends to be used from python directly. The initial idea was to enable the user could use Python to develop the TE program. Something like this - https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr_pybind.py. But the L833 code does not cover all the cases that have been supported by SimpleIREvaluator. SimpleIREvaluator could support most torch data types, including double.

Mar 18 '23 02:03 EikanWang

It theoretically could support data types other than tensors and ints in "call" and "call_raw", but only code pieces and cases I see are actually only those 2 listed above.

I don't see any implementation for basic types other than int, and current approach wouldn't work for types bigger than pointer anyway.

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833

Currently, this is only piece of code I could find which processes types other than tensors in related code. So, here we have "int" converted to "int64_t" and stored into pointer (void*). Let's call it pointer 1.

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1280-L1292

Here's the code reading previously stored value. "data" is pointer (void*), let's call it pointer 2. pointer 2 points to pointer 1, which contains "int64_t", our intended data, which was originally "int". On 64bit systems pointer 2 is usually 8 bytes, pointer 1 also 8 bytes, int64_t is 8 bytes and int is usually 4 bytes. Everything good. On 32bit systems pointers are usually 4 bytes, int64_t is 8 bytes and int isusually 4 bytes. While int64_t is truncated to 4 bytes when copying to pointer, considering originally it was 4 bytes int, everything ends well.

Now, let's for example take "double" (int64_t would work same) as value instead of original "int", on 32bit systems. "double" is usually 8 bytes, even on 32bit systems. So, pointer 2 (4 bytes) points to pointer 1 (4 bytes), but originally pointer 1 should have contained "double", which is 8 bytes, which it cannot do.

What I'm trying to say, is that for values longer than sizeof(void*) current approach won't work. But currently I can find only code using "int" types, and luckly for those types it usually works. I propose to fix "int" and smaller integer types use-cases, and to have current values binding system reworked when longer ints (int64_t) or floating-point types would actually be used.

Mar 20 '23 13:03 AlekseiNikiforovIBM

I propose to fix "int" and smaller integer types use-cases, and to have current values binding system reworked when longer ints (int64_t) or floating-point types would actually be used.

It is an issue of TE to support double/int64_t scalar on 32-bit system. But we still need to support other data types, not only the integer. https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr.py#L1184-L1199

Let's separate the two things:

Enable TE to support big-endian
Fix the issue that TE does not support double/int64_t on a 32-bit system

@AlekseiNikiforovIBM , How about let's only focus on the second thing for this PR to meet your requirement now?

Regarding the first one, We need to sort out a good design to fix it but not this PR.

Mar 21 '23 03:03 EikanWang

I've tested how https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr.py#L1184-L1199 works. It looks like it uses different mechanism, and not affected by this patch at all.

It goes through ScriptFunction.__call__: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/python/script_init.cpp#L1385-L1401

and I guess it always makes buffers, and check !bufArg.isVar() is always true.

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1274-L1278

I see there are multiple checks failed. Any of them I should take a look in regards to this patch?

I agree with separation of two topics, big endian support and double/int64_t on 32bit systems support. But from what I can see, there are no places where anything except for int is used in TE through this mechanism and it's actually a var, not a buf.

If such places are in tests, they should fail for this pull request. And they'd have been failing on big endian systems right now, but they're passing. Please let me know if I missed new failing tests for this pull request too.

So, this pull request fixes TE big endian support, and while it disables double/int64_t support for some use cases, such use cases are most likely currently not covered by tests.

Mar 23 '23 14:03 AlekseiNikiforovIBM

Then how about let's add something like follows to ensure the input is neither Double nor Long for big endian system?

  TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double);
  TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long);

With the guard, your code does not need to add static_assert.

Again, I need to highlight that Python float and int are translated to double and int64_t respectively.

Mar 27 '23 08:03 EikanWang

A few lines below it's already caught in default case, and exception would be thrown:

https://github.com/pytorch/pytorch/pull/96951/files#diff-33783d984927670883fec7121b94a5142e54bedf159d7b85af6800818e513d09R1311-R1312

Should it still be added?

Mar 27 '23 11:03 AlekseiNikiforovIBM

A few lines below it's already caught in default case, and exception would be thrown:

https://github.com/pytorch/pytorch/pull/96951/files#diff-33783d984927670883fec7121b94a5142e54bedf159d7b85af6800818e513d09R1311-R1312

Should it still be added?

It means that we still need to support double and int64_t because the Python float and int are translated to 64bit as I mentioned before.

Mar 27 '23 13:03 EikanWang

@AlekseiNikiforovIBM , I submitted a PR to fix the 32-bit issue - https://github.com/pytorch/pytorch/pull/97669.

Mar 27 '23 14:03 EikanWang

@AlekseiNikiforovIBM my PR https://github.com/pytorch/pytorch/pull/97669 has been merged, please rebase this PR.

Mar 29 '23 13:03 EikanWang

Thanks, with your change the fix can be simplified.

Mar 29 '23 16:03 AlekseiNikiforovIBM

@pytorchbot merge

Apr 02 '23 12:04 ezyang

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status here

Apr 02 '23 12:04 pytorchmergebot

pytorch pytorch copied to clipboard

TensorExpr eval: fix copying variables from pointers on big endian systems

:link: Helpful Links

:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/96951

:white_check_mark: No Failures

Merge started

Merge failed

Merge started

Merge failed

Merge started

Merge failed

Merge started

pytorch
pytorch copied to clipboard