chipStar icon indicating copy to clipboard operation
chipStar copied to clipboard

rocThrust support?

Open StarGazerM opened this issue 1 year ago • 18 comments

Hi:

It's great to see rocPRIM is supported by chipSTAR, does that means rocThrust can also be supported out-of-box?

StarGazerM avatar Oct 29 '24 15:10 StarGazerM

It doesn't build out of the box - compilation issues but I'll look into it

pvelesko avatar Oct 29 '24 19:10 pvelesko

Checkout rocThrust branch, download submodules. I was able to compile most of the rocThrust examples. Let me know what issues you encounter.

# build chipStar rocThrust branch and install
cmake ../ -DCHIP_BUILD_ROCPRIM=ON -DCMAKE_INSTALL_PREFIX=/space/pvelesko/install/HIP/chipStar/test
ninja install

# set your paths
cd chipStar/rocThrust
# cmake configure and install (not yet integrated into chipStar build system)
cmake ../ -DCMAKE_CXX_COMPILER=hipcc  -DCMAKE_INSTALL_PREFIX=/space/pvelesko/install/HIP/chipStar/test/rocthrust  -DBUILD_EXAMPLES=OFF -DCMAKE_C_COMPILER=hipcc

@StarGazerM

pvelesko avatar Oct 30 '24 05:10 pvelesko

wow! this looks so great!! I will try it

StarGazerM avatar Oct 30 '24 13:10 StarGazerM

I tried the build, seems there is a error pops when building the examples, but in upstream rocPRIME:

/usr/local/include/rocprim/intrinsics/atomic.hpp:51:16: error: no matching function for call to 'atomicAdd'

StarGazerM avatar Oct 30 '24 18:10 StarGazerM

Did you build exactly how I did? this should be fixed in the submodules

pvelesko avatar Oct 30 '24 18:10 pvelesko

I turned on the -DBUILD_EXAMPLES=OFF on thrust build

StarGazerM avatar Oct 30 '24 23:10 StarGazerM

a couple of examples fail to build due to a SPIR-V translation error but the atomicAdd was fixed.

The issue is that you have a previously installed version which picked is getting picked up. Previous version was built without the atomic fix.

Please remove it, and build again - this should build rocPRIM in-tree with the atomic fix.

pvelesko avatar Oct 31 '24 04:10 pvelesko

I see, will try and report back ! is https://github.com/CHIP-SPV/rocPRIM/commit/4894fd75ab1ab4b3ecc2d3e9f4703a3e37621113 the patch you mentioned?(just curious what actually fix this)

StarGazerM avatar Oct 31 '24 05:10 StarGazerM

No, the patch is in chipStar atomcis, which get copied into rocPRIM headers I believe

pvelesko avatar Oct 31 '24 08:10 pvelesko

Here is the docker command I am using, I install everything into /usr/local, but seems still throw a function mismatch on atomicAdd

RUN git clone -b rocThrust https://github.com/CHIP-SPV/chipStar.git && \
    cd chipStar && git submodule update --init --recursive 
RUN module unload pocl; module load oneapi/2024.1.0 && which icpx  && \
    cd chipStar &&  \
    mkdir build && \
    cd build && \
    cmake .. -DCMAKE_BUILD_TYPE=Release -DCHIP_BUILD_HIPBLAS=ON -DCHIP_BUILD_ROCPRIM=ON -DCMAKE_INSTALL_PREFIX=/usr/local && \
    sudo make -j$(nproc) && \
    /home/chipStarUser/chipStar/build/samples/0_MatrixMultiply/MatrixMultiply && \
    sudo make install 

RUN module unload pocl && module load oneapi/2024.1.0 &&  module load pocl && \
    cd /home/chipStarUser/chipStar/rocThrust && \
    cmake -Bbuild -DCMAKE_CXX_COMPILER=hipcc -DCMAKE_C_COMPILER=hipcc -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr/local . && \
    sudo cmake --build build -j$(nproc) && \
    cd build && sudo make install

I saw https://github.com/CHIP-SPV/chipStar/blob/b985dd29f699c24216d4265347895b929bc398d0/include/hip/devicelib/atomics.hh#L59 in file, this should works seems?

StarGazerM avatar Oct 31 '24 14:10 StarGazerM

Ah - try adding -DCHIP_ENABLE_NON_COMPLIANT_DEVICELIB_CODE=ON

pvelesko avatar Oct 31 '24 14:10 pvelesko

after enable this flag, I got a linker error, does that means force enable this cause intwidth issue (could related to my hardware? I am using intel A770 for testing):

[ 57%] Building CXX object examples/CMakeFiles/example_thrust_scan_by_key.dir/scan_by_key.cu.o
InvalidBitWidth: Invalid bit width in input: 24

I also test the compiled program behavior (the sort exmaples I tested). Seems device_vector is not working correctly. It runs into an OOM when doing H->D(? this doesn't make sense to me). I can dig more into it, do you have some clue what could be the problem?

CHIP error [TID 4154] [1730387918.171449236] : hipErrorOutOfMemory (CL_OUT_OF_RESOURCES ) in /home/chipStarUser/chipStar/src/backend/OpenCL/CHIPBackendOpenCL.cc:1767:finish

CHIP error [TID 4154] [1730387918.171550807] : Caught Error: hipErrorOutOfMemory
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  __copy::trivial_device_copy H->D: failed: hipErrorOutOfMemory: hipErrorOutOfMemory
Aborted (core dumped)

StarGazerM avatar Oct 31 '24 15:10 StarGazerM

[ 57%] Building CXX object examples/CMakeFiles/example_thrust_scan_by_key.dir/scan_by_key.cu.o InvalidBitWidth: Invalid bit width in input: 24

this is a bug in the LLVM -> SPIR-V Translator, filed an issue for it today https://github.com/KhronosGroup/SPIRV-LLVM-Translator/issues/2823

Regarding the test, you didn't specify which sort test you ran. I ran rocprim.device_merge_sort and only 2 out of 34 assertions fail.

26: [  PASSED  ] 32 tests.
26: [  FAILED  ] 2 tests, listed below:
26: [  FAILED  ] RocprimDeviceSortTests/13.SortKey, where TypeParam = DeviceSortParams<double, test_utils::custom_test_type<double>, rocprim::less<double> >
26: [  FAILED  ] RocprimDeviceSortTests/13.SortKeyValue, where TypeParam = DeviceSortParams<double, test_utils::custom_test_type<double>, rocprim::less<double> >

I'll have to look into this

pvelesko avatar Oct 31 '24 15:10 pvelesko

I am running the example_thrust_sort under rocThrust/example The issue is device/host copy failed when using device_vector. Both random access on device_vector and copy from its internal data.

StarGazerM avatar Oct 31 '24 15:10 StarGazerM

Hello everyone, I am currently using chipstar to compile the test of rocm Thrust. How is your progress? I can only compile to 20% currently. Can you tell me how your progress is and how chipsatr supports rocThrust currently? Thank you so much

he97 avatar Dec 18 '24 14:12 he97

Hello everyone, I am currently using chipstar to compile the test of rocm Thrust. How is your progress? I can only compile to 20% currently. Can you tell me how your progress is and how chipsatr supports rocThrust currently? Thank you so much

rocThrust not working due to an upstream problem in LLVM's SPIR-V backend

StarGazerM avatar Dec 20 '24 06:12 StarGazerM

How is your progress? I can only compile to 20% currently.

20% of samples?

Current status is that there are some SPIR-V issues when compiling certain samples. I'll be getting back to this after new years

pvelesko avatar Dec 22 '24 13:12 pvelesko

@StarGazerM @he97

100% of rocThrust examples are now passing: https://github.com/CHIP-SPV/chipStar/tree/fix-promote-int-pass

It would be great if you could test

pvelesko avatar Mar 22 '25 08:03 pvelesko