clvk icon indicating copy to clipboard operation
clvk copied to clipboard

Trouble compiling on Pi 4

Open krakenrf opened this issue 1 year ago • 20 comments

Hi, the docs say this can be built on a Raspberry Pi, but I've been trying to compile this on a Pi 4 running Ubuntu 24.10 for the last few days without luck. Running on a 64GB SD card which I hope is big enough.

I've managed to get a little further each time with some fixes:

  1. First I couldn't run fetch_sources as Git simply did not want to download llvm as I think it's too big. Using --shallow helped.

  2. I kept getting an error cc: error: unrecognized command-line option during make. It seems that the compiler it was trying to use was gcc by default. I forced it to use clang by adding -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ to cmake.

  3. I have the Pi 4 2GB and 8GB models. At first I tried the 2GB model, and the compilation just kept crashing and causing the entire board to restart. Switching to the 8GB model helped me compile to a higher percentage, but it still crashed either with a reset, or just the terminal window closing. I then increased the swap space to 16GB, and again that got me further, but I still end up with a crash, or just with the terminal window closing itself without completion.

  4. I tried cmake ../ -DCLVK_BUILD_TESTS=OFF -DLLVM_INCLUDE_BENCHMARKS=OFF -DLLVM_INCLUDE_TESTS=OFF -DLLVM_ENABLE_BINDINGS=OFF -DLLVM_ENABLE_UNWIND_TABLES=OFF-DLLVM_BUILD_TOOLS=OFF -DCLSPV_BUILD_SPIRV_DIS=OFF -DCLSPV_BUILD_TESTS=OFF -DCLVK_BUILD_TESTS=OFF -DCLVK_BUILD_SPIRV_TOOLS=OFF -DCLVK_ENABLE_SPIRV_IL=OFF -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ to try and cut down on some of the things that need to compile, but at some point during an overnight compile the terminal window just closes itself, and the compilation never completes.

  5. Running make -j1 to force one core seems to help to get it a little further, but ultimately I still end up with a reset or terminal window close.

The culprit appears to be that it never gets past the llvm compilation. One guess I have is that the crashes are from running out of memory, but I've never seen the ram bar full during compilation with htop up, though the crashes always occur when I leave it running overnight so I may be missing it.

I am Interested to hear if anyone has anything else I should try, or if anyone else had it successfully installed on a Pi 4 recently?

krakenrf avatar Nov 20 '24 22:11 krakenrf

You can try to use a prebuilt libclc to reduce the build load.

Build libclc on something else than a raspberry pi: https://github.com/kpet/clvk/blob/main/.github/workflows/presubmit.yml#L112

Copy clspv--.bc and clspv--64.bc to your raspberry pi and then you will need to add the following to your cmake command:

-DCLSPV_EXTERNAL_LIBCLC_DIR=<place_where_clspv--.bc_is>

rjodinchr avatar Nov 21 '24 06:11 rjodinchr

Thanks, I just tried that. I couldn't figure out how to just compile clspv, so I just compiled the whole clvk project on a fast Linux computer, and found the clspv files, and copied them over to the Pi 4. Then I ran:

cmake ../ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCLSPV_EXTERNAL_LIBCLC_DIR=../libclc But I'm getting stuck on this error now:

[  0%] Built target clspv_baked_opencl_header
make[2]: *** No rule to make target 'external/clspv/libclc/clspv--.bc', needed by 'external/clspv/include/clspv/clspv_builtin_library.h'.  Stop.
make[1]: *** [CMakeFiles/Makefile2:245790: external/clspv/cmake/CMakeFiles/clspv_builtin_library.dir/all] Error 2

krakenrf avatar Nov 21 '24 11:11 krakenrf

It feels like clspv--.bc is just not in the external/clspv/libclc repository. Are you sure of what you have copied?

rjodinchr avatar Nov 21 '24 12:11 rjodinchr

I had put them in ../libclc, but I also tried putting them in external/clspv/libclc, but with the same results.

I'm guessing it's not as simple as just copying the clspv--.bc over from something I built on another platform. Looking at the readme compilation of clspv is under cross-compilation, so I'm assuming I need to do something with cross-compilation?

Unfortunately those instructions on the readme are not clear to me. I gave it a go, and on the x64 machine I created a folder clang_host and ran the cmake command, but get the error The source directory "home/carl/clvk/external/clspv/third_party/llvm" does not appear to contain CMakeLists.txt"

krakenrf avatar Nov 21 '24 12:11 krakenrf

Alright, then the issue is ../libclc. This is a relative path, you should avoid them as much as possible as you never know where they will end up being used. Instead use the following: -DCLSPV_EXTERNAL_LIBCLC_DIR=$(realpath ../libclc)

rjodinchr avatar Nov 21 '24 12:11 rjodinchr

Nice! Looks like that worked well, I was finally able to complete the compilation and it's working. Thanks!

krakenrf avatar Nov 22 '24 02:11 krakenrf

Unfortunately while it compiled, and the software shows a selectable V3D option, it doesn't seem to actually work.

I ran ./simple_test and got:

Platform: clvk Device: V3D 4.2.14 /home/dd/clvk/tests/simple/simple.cpp:76 error after CL call: -11

Interestingly, it also seems to have failed on my Ubuntu laptop, as I get the same error on it:

Platform: clvk Device: Intel(R) UHD Graphics (TGL GT1) /home/carl/clvk/tests/simple/simple.cpp:76 error after CL call:-11

I guess the clspv--.bc I compiled on the laptop must be not working?

krakenrf avatar Nov 25 '24 01:11 krakenrf

I would not expect the libclc binaries to be the issue here, but it's not impossible.

Could you run it with the following environment variables set:

CLVK_LOG=4
CLVK_LOG_DEST=file:clvk.log

and upload clvk.log here? That would help us analyse what is going wrong.

rjodinchr avatar Nov 25 '24 07:11 rjodinchr

Thanks, here is the output on my Ubuntu 22.04 laptop. Will get the Pi 4 one later.

clvk.log

krakenrf avatar Nov 25 '24 09:11 krakenrf

hum, I see. When we build the test, we create a clvk.conf in the build folder. That file contains an entry preventing compiling.

you can either remove that file. Or run the binary from another repo.

rjodinchr avatar Nov 25 '24 09:11 rjodinchr

We should fix this. There's no reason for this broken config file ending up in the default location :).

kpet avatar Nov 26 '24 19:11 kpet

Thanks, that does fix the issue with the test not running. The test works fine now.

However, I'm still having issues getting a kernel to run from a program I'm trying to test clvk with. I'm not entirely sure if it's the kernel that is incompatible, or something wrong with my clvk install (or probably both), but the error I currently get makes it seem like my clvk install is not right.

First, this is my command to start the software (I tried both clspv64 and clspv - not sure which one I should use but they both give the same error):

CLVK_CLSPV_PATH=/home/dd/clvk/libclc/clspv64--.bc LD_LIBRARY_PATH=/home/dd/clvk/build/ satdump-ui

When running something that uses the GPU the error is:

(E) Error warping on GPU : Error building: /usr/bin/lli-15: lli: /home/dd/clvk/libclc/clspv64--.bc: error: Unknown attribute kind (86) (Producer: 'LLVM20.0.0git' Reader: 'LLVM 15.0.6')


The error made me think it's something to do with the system clang version so later I manually installed llvm20/clang20 and tried to recompile clvk on a Pi 5 (which BTW unlike the Pi 4 can complete the entire compilation when using the default clang with make -j1 and 8GB RAM, 16GB of swap). But the compilation with clang20 installed failed with the following error:

[ 93%] Building CXX object external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/val/validate_decorations.cpp.o
In file included from /home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:15:
In file included from /usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/algorithm:61:
In file included from /usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:61:
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_tempbuf.h:263:8: error: 'get_temporary_buffer<MemberOffsetPair>' is deprecated [-Werror,-Wdeprecated-declarations]
  263 |                 std::get_temporary_buffer<value_type>(_M_original_len));
      |                      ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:4996:15: note: in instantiation of member function 'std::_Temporary_buffer<__gnu_cxx::__normal_iterator<MemberOffsetPair *, std::vector<MemberOffsetPair>>, MemberOffsetPair>::_Temporary_buffer' requested here
 4996 |       _TmpBuf __buf(__first, (__last - __first + 1) / 2);
      |               ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_algo.h:5070:23: note: in instantiation of function template specialization 'std::__stable_sort<__gnu_cxx::__normal_iterator<MemberOffsetPair *, std::vector<MemberOffsetPair>>, __gnu_cxx::__ops::_Iter_comp_iter<(lambda at /home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:482:9)>>' requested here
 5070 |       _GLIBCXX_STD_A::__stable_sort(__first, __last,
      |                       ^
/home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:480:10: note: in instantiation of function template specialization 'std::stable_sort<__gnu_cxx::__normal_iterator<MemberOffsetPair *, std::vector<MemberOffsetPair>>, (lambda at /home/dd/clvk/external/SPIRV-Tools/source/val/validate_decorations.cpp:482:9)>' requested here
  480 |     std::stable_sort(
      |          ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/c++/12/bits/stl_tempbuf.h:99:5: note: 'get_temporary_buffer<MemberOffsetPair>' has been explicitly marked deprecated here
   99 |     _GLIBCXX17_DEPRECATED
      |     ^
/usr/lib/gcc/aarch64-linux-gnu/12/../../../../include/aarch64-linux-gnu/c++/12/bits/c++config.h:119:34: note: expanded from macro '_GLIBCXX17_DEPRECATED'
  119 | # define _GLIBCXX17_DEPRECATED [[__deprecated__]]
      |                                  ^
1 error generated.
make[2]: *** [external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/build.make:656: external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/val/validate_decorations.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:12847: external/SPIRV-Tools/source/CMakeFiles/SPIRV-Tools-static.dir/all] Error 2
make: *** [Makefile:156: all] Error 2

krakenrf avatar Nov 27 '24 05:11 krakenrf

clspv--.bc is not the clspv compiler, it is the libclc used by the compiler. It is link static with clspv, so once clspv is compiled it is not needed anymore.

You should not need to define CLVK_CLSPV_PATH to use clvk, did you encounter an error trying to run without it? You can try to compile clvk with -DCLVK_CLSPV_ONLINE_COMPILER=1 (in your cmake arguments). That will link clspv statically in clvk's OpenCL library (libOpenCL.so).

rjodinchr avatar Nov 27 '24 05:11 rjodinchr

Ah okay that's my bad I must have misunderstood the readme.

When running without CLVK_CLSPV_PATH, I still end up with warnings and error building, but it might be kernel compatibility related:

(E) Error warping on GPU : Error building: clvk-fJ1Bn1/source.cl:34:13: warning: comparing floating point with == or != is unsafe
   34 |   if (shift == 0)
      |       ~~~~~ ^  ~
clvk-fJ1Bn1/source.cl:37:9: warning: mixing declarations and code is incompatible with standards before C99
   37 |   float x = cos(*lat * DEG_TO_RAD) * cos(*lon * DEG_TO_RAD);
      |         ^
clvk-fJ1Bn1/source.cl:59:9: warning: mixing declarations and code is incompatible with standards before C99
   59 |   float dist1 = SQ(xr[1] - pxy[0]) + SQ(yr[1] - pxy[1]);
      |         ^
clvk-fJ1Bn1/source.cl:79:23: warning: implicit conversion from '__private int' to 'float' may lose precision
   79 |   float x_diff = rx - x;
      |                     ~ ^
clvk-fJ1Bn1/source.cl:80:23: warning: implicit conversion from '__private int' to 'float' may lose precision
   80 |   float y_diff = ry - y;
      |                     ~ ^
clvk-fJ1Bn1/source.cl:82:29: warning: implicit conversion changes signedness: 'int' to 'size_t' (aka 'unsigned i

Previously I tried fixing these warnings as they were just simple C issues, but ultimately I still ended up with:

(E) Error warping on GPU : Error building:

And then no additional error message or warnings given.

I guess it's just that this kernel would need some work to be compatible with the limitations of clvk and clspv? https://github.com/SatDump/SatDump/blob/master/resources/opencl/warp_image_thin_plate_spline_fp32.cl

But I also tried stripping down the kernel to just an empty function to see if it could at least not give an error, and it still wouldn't compile without the same error. I admit I don't really know what I'm doing here with the Kernel though, or if the errors I'm getting are SatDump related, or clvk related.

kernel void warp_image_thin_plate_spline(
    global ushort *map_image,
    global ushort *img,
    global int *tps_no_points,
    global float *tps_x,
    global float *tps_y,
    global float *tps_coef_1,
    global float *tps_coef_2,
    global float *tps_xmean,
    global float *tps_ymean,
    global int *img_settings) {

    // Suppress warnings
    (void)map_image;
    (void)img;
    (void)tps_no_points;
    (void)tps_x;
    (void)tps_y;
    (void)tps_coef_1;
    (void)tps_coef_2;
    (void)tps_xmean;
    (void)tps_ymean;
    (void)img_settings;
}

Is there any way to just compile the kernel with clvk or clspv standalone, outside of the SatDump software? Then at least I can confirm if the kernel is compatible or not.

krakenrf avatar Nov 27 '24 09:11 krakenrf

Please provide an updated clvk.log from a run with the application unmodified (https://github.com/kpet/clvk/issues/743#issuecomment-2497039882)

rjodinchr avatar Nov 27 '24 09:11 rjodinchr

clvk.log

Ah looks like support is missing for Int16?

krakenrf avatar Nov 27 '24 09:11 krakenrf

Also I just want to add another observation. When I run SatDump with clvk and the same kernel on my Intel laptop, it compiles, runs, and finishes. But no final image is generated for some reason. The warped image is just not there.

If I run the warp on the Intel laptop without clvk, using the native OpenCL implementation, it works fine.

The same problem with the image not appearing happens on the Raspberry Pi 4/5 and Intel Laptop if I try to run it with llvmpipe.

I'll attach clvk.log for a llvmpipe run on the Pi 4 just in case it helps.

clvk.log

krakenrf avatar Nov 27 '24 10:11 krakenrf

The reason for the failure you have right now is because the Vulkan driver that clvk tries to use does not support spv::CapabilityInt16.

But you have 2 Vulkan implementations on your platform:

[CLVK] Found 2 physical devices
[CLVK] linux_read_sorted_physical_devices:
[CLVK]      Original order:
[CLVK]            [0] llvmpipe (LLVM 15.0.6, 128 bits)
[CLVK]            [1] V3D 4.2.14

[0] llvmpipe is a software emulation, and it's the one being used by your application. On top of that it does not support Int16.

I think what you want is to use [1] V3D, which may support Int16.

rjodinchr avatar Nov 27 '24 10:11 rjodinchr

I'm fairly certain it is using V3D. In the SatDump software I can choose between V3D and llvmpipe as the OpenCL device to use.

Choosing V3D yields the first clvk.log I posted, and the GPU building error.

Choosing llvmpipe, yields the second clvk.log I posted. No error, but no image.

krakenrf avatar Nov 27 '24 10:11 krakenrf

Alright, then your issue with V3D is that it does not support Int16. At the moment clspv is not capable of generating the code without it.

For the issue with llvmpipe, I don't see anything in the log.

rjodinchr avatar Nov 27 '24 10:11 rjodinchr