pytorch
pytorch copied to clipboard
TensorExpr eval: fix copying variables from pointers on big endian systems
When copying data from pointers, only lowest bytes are copied. On little endian systems they are located at the beginning of pointer. On big endian systems they are located at the end of pointer.
This change fixes TestTensorExprPyBind::test_dynamic_shape and TestTensorExprPyBind::test_dynamic_shape_2d tests from test/test_tensorexpr_pybind.py on big endian systems.
cc @EikanWang @jgong5
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/96951
- :page_facing_up: Preview Python docs built from this PR
- :page_facing_up: Preview C++ docs built from this PR
- :question: Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours
Note: Links to docs will display an error until the docs builds have been completed.
:white_check_mark: No Failures
As of commit 256ded05e4a2c3ce542af582b018d065dc5e9bf0: :green_heart: Looks good so far! There are no failures yet. :green_heart:
This comment was automatically generated by Dr. CI and updates every 15 minutes.
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here
Merge failed
Reason: 1 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here
Merge failed
Reason: 1 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud
need lintfix
There's one more issue:
2023-03-17T14:46:37.7006716Z /opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -Iaten/src -I../../aten/src -I. -I../../ -I../../caffe2/aten/src/TH -Icaffe2/aten/src/TH -Icaffe2/aten/src -Icaffe2/../aten/src -I../../torch/csrc -I../../third_party/miniz-2.1.0 -I../../third_party/kineto/libkineto/include -Ivulkan -I../../aten/../third_party/VulkanMemoryAllocator -I../../aten/src/ATen/.. -I../../third_party/FXdiv/include -I../../c10/.. -I../../third_party/pthreadpool/include -I../../third_party/cpuinfo/include -I../../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../../third_party/cpuinfo/deps/clog/include -I../../third_party/NNPACK/include -I../../. -I/opt/ndk/sources/third_party/vulkan/src/include -I../../third_party/FP16/include -I../../third_party/fmt/include -I../../third_party/flatbuffers/include -isystem ../../third_party/XNNPACK/include -isystem /opt/ndk/sources/third_party/vulkan/src/common -isystem ../../cmake/../third_party/eigen -isystem ../../caffe2 -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -mfpu=vfpv3-d16 -fno-addrsig -march=armv7-a -mthumb -Wa,--noexecstack -Wformat -Werror=format-security -stdlib=libc++ -frtti -fexceptions -ffunction-sections -fdata-sections -DNO_EXPORT -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_VULKAN_WRAPPER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN -DUSE_VULKAN_API -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wvla-extension -Wno-range-loop-analysis -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-missing-braces -Wunused-lambda-capture -Qunused-arguments -fcolor-diagnostics -fdiagnostics-color=always -fno-math-errno -fno-trapping-math -Werror=format -g0 -Oz -DNDEBUG -fPIC -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-sign-compare -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-missing-braces -Wno-range-loop-analysis -fvisibility=hidden -O2 -pthread -std=gnu++17 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o -c ../../torch/csrc/jit/tensorexpr/eval.cpp
2023-03-17T14:46:37.7013943Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1303:55: error: static_assert failed "Can't read data bigger than sizeof(void*) from pointer"
2023-03-17T14:46:37.7117196Z AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE);
2023-03-17T14:46:37.7118550Z ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
2023-03-17T14:46:37.7119211Z ../../c10/core/ScalarType.h:190:3: note: expanded from macro 'AT_FORALL_SCALAR_TYPES_AND3'
2023-03-17T14:46:37.7119677Z _(int64_t, Long) \
2023-03-17T14:46:37.7120030Z ^~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7120585Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1284:5: note: expanded from macro 'TYPE_CASE'
2023-03-17T14:46:37.7121076Z static_assert(sizeof(Type) <= sizeof(void*), \
2023-03-17T14:46:37.7121454Z ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7122140Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1303:55: error: static_assert failed "Can't read data bigger than sizeof(void*) from pointer"
2023-03-17T14:46:37.7123020Z AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE);
2023-03-17T14:46:37.7123430Z ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
2023-03-17T14:46:37.7123990Z ../../c10/core/ScalarType.h:192:3: note: expanded from macro 'AT_FORALL_SCALAR_TYPES_AND3'
2023-03-17T14:46:37.7124461Z _(double, Double) \
2023-03-17T14:46:37.7124788Z ^~~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7125338Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1284:5: note: expanded from macro 'TYPE_CASE'
2023-03-17T14:46:37.7157284Z static_assert(sizeof(Type) <= sizeof(void*), \
2023-03-17T14:46:37.7157715Z ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It looks like static assert fails on one of platforms for "double" type. As far as I'm aware, only "int" type is saved as pointer now, but I might be incorrect:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833
Maybe, "double" type should be excluded here? Or only int types should remain?
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here
Merge failed
Reason: 1 mandatory check(s) failed. The first few are:
Dig deeper by viewing the failures on hud
@EikanWang do you know?
Maybe, "double" type should be excluded here? Or only int types should remain?
TE supports most torch datatypes including double
, and SimpleIREvaluator
is a codegen of TE. Hence, we should not just support int
types.
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833
@AlekseiNikiforovIBM , This is a python interface that intends to be used from python directly. The initial idea was to enable the user could use Python to develop the TE program. Something like this - https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr_pybind.py. But the L833 code does not cover all the cases that have been supported by SimpleIREvaluator
. SimpleIREvaluator
could support most torch data types, including double
.
It theoretically could support data types other than tensors and ints in "call" and "call_raw", but only code pieces and cases I see are actually only those 2 listed above.
I don't see any implementation for basic types other than int, and current approach wouldn't work for types bigger than pointer anyway.
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833
Currently, this is only piece of code I could find which processes types other than tensors in related code. So, here we have "int" converted to "int64_t" and stored into pointer (void*). Let's call it pointer 1.
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1280-L1292
Here's the code reading previously stored value. "data" is pointer (void*), let's call it pointer 2. pointer 2 points to pointer 1, which contains "int64_t", our intended data, which was originally "int". On 64bit systems pointer 2 is usually 8 bytes, pointer 1 also 8 bytes, int64_t is 8 bytes and int is usually 4 bytes. Everything good. On 32bit systems pointers are usually 4 bytes, int64_t is 8 bytes and int isusually 4 bytes. While int64_t is truncated to 4 bytes when copying to pointer, considering originally it was 4 bytes int, everything ends well.
Now, let's for example take "double" (int64_t would work same) as value instead of original "int", on 32bit systems. "double" is usually 8 bytes, even on 32bit systems. So, pointer 2 (4 bytes) points to pointer 1 (4 bytes), but originally pointer 1 should have contained "double", which is 8 bytes, which it cannot do.
What I'm trying to say, is that for values longer than sizeof(void*) current approach won't work. But currently I can find only code using "int" types, and luckly for those types it usually works. I propose to fix "int" and smaller integer types use-cases, and to have current values binding system reworked when longer ints (int64_t) or floating-point types would actually be used.
I propose to fix "int" and smaller integer types use-cases, and to have current values binding system reworked when longer ints (int64_t) or floating-point types would actually be used.
It is an issue of TE to support double
/int64_t
scalar on 32-bit system. But we still need to support other data types, not only the integer.
https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr.py#L1184-L1199
Let's separate the two things:
- Enable TE to support big-endian
- Fix the issue that TE does not support
double
/int64_t
on a 32-bit system
@AlekseiNikiforovIBM , How about let's only focus on the second thing for this PR to meet your requirement now?
Regarding the first one, We need to sort out a good design to fix it but not this PR.
I've tested how https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr.py#L1184-L1199 works. It looks like it uses different mechanism, and not affected by this patch at all.
It goes through ScriptFunction.__call__
:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/python/script_init.cpp#L1385-L1401
and I guess it always makes buffers, and check !bufArg.isVar()
is always true.
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1274-L1278
I see there are multiple checks failed. Any of them I should take a look in regards to this patch?
I agree with separation of two topics, big endian support and double
/int64_t
on 32bit systems support. But from what I can see, there are no places where anything except for int
is used in TE through this mechanism and it's actually a var
, not a buf
.
If such places are in tests, they should fail for this pull request. And they'd have been failing on big endian systems right now, but they're passing. Please let me know if I missed new failing tests for this pull request too.
So, this pull request fixes TE big endian support, and while it disables double
/int64_t
support for some use cases, such use cases are most likely currently not covered by tests.
Then how about let's add something like follows to ensure the input is neither Double
nor Long
for big endian system?
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double);
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long);
With the guard, your code does not need to add static_assert
.
Again, I need to highlight that Python float and int are translated to double and int64_t respectively.
A few lines below it's already caught in default case, and exception would be thrown:
https://github.com/pytorch/pytorch/pull/96951/files#diff-33783d984927670883fec7121b94a5142e54bedf159d7b85af6800818e513d09R1311-R1312
Should it still be added?
A few lines below it's already caught in default case, and exception would be thrown:
https://github.com/pytorch/pytorch/pull/96951/files#diff-33783d984927670883fec7121b94a5142e54bedf159d7b85af6800818e513d09R1311-R1312
Should it still be added?
It means that we still need to support double and int64_t because the Python float and int are translated to 64bit as I mentioned before.
@AlekseiNikiforovIBM , I submitted a PR to fix the 32-bit issue - https://github.com/pytorch/pytorch/pull/97669.
@AlekseiNikiforovIBM my PR https://github.com/pytorch/pytorch/pull/97669 has been merged, please rebase this PR.
Thanks, with your change the fix can be simplified.
@pytorchbot merge
Merge started
Your change will be merged once all checks pass (ETA 0-4 Hours).
Learn more about merging in the wiki.
Questions? Feedback? Please reach out to the PyTorch DevX TeamAdvanced Debugging
Check the merge workflow status
here