stable-diffusion.cpp Workaround to build with CUDA

After a git submodule update (ggml) on main :
-- The C compiler identification is GNU 10.2.1
-- The CXX compiler identification is GNU 10.2.1
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Build static library
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- x86 detected
-- Adding CPU backend variant ggml-cpu: -march=native
-- Found BLAS: /root/aocl/5.0.0/gcc/lib/libblis.so
-- BLAS found, Libraries: /root/aocl/5.0.0/gcc/lib/libblis.so
-- BLAS found, Includes: /root/aocl/5.0.0/gcc/include
-- Including BLAS backend
-- Found CUDAToolkit: /usr/local/cuda/include
-- CUDA Toolkit found
-- Using CUDA architectures: 50;61;70;75;80
-- The CUDA compiler identification is NVIDIA 11.8.89
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- CUDA host compiler is GNU 10.2.1

-- Including CUDA backend
-- Configuring done
-- Generating done
-- Build files have been written to: /var/www/stats/stable-diffusion.cpp/build
Scanning dependencies of target ggml-base
Scanning dependencies of target zip
[  1%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-quants.c.o
[  2%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml.c.o
[  5%] Building C object ggml/src/CMakeFiles/ggml-base.dir/ggml-alloc.c.o
[  5%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-backend.cpp.o
[  5%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-opt.cpp.o
[  6%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/ggml-threading.cpp.o
[  7%] Building C object thirdparty/CMakeFiles/zip.dir/zip.c.o
[  7%] Building CXX object ggml/src/CMakeFiles/ggml-base.dir/gguf.cpp.o
In file included from /var/www/stats/stable-diffusion.cpp/thirdparty/zip.c:40:
/var/www/stats/stable-diffusion.cpp/thirdparty/miniz.h:4988:9: note: ‘#pragma message: Using fopen, ftello, fseeko, stat() etc. path for file I/O - this path may not support large files.’
 4988 | #pragma message(                                                               \
      |         ^~~~~~~
[  8%] Linking CXX static library libggml-base.a
[  8%] Built target ggml-base
Scanning dependencies of target ggml-cpu
Scanning dependencies of target ggml-blas
Scanning dependencies of target ggml-cuda
[ 10%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.c.o
[ 10%] Building CXX object ggml/src/ggml-blas/CMakeFiles/ggml-blas.dir/ggml-blas.cpp.o
[ 11%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/mmq.cpp.o
[ 12%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu.cpp.o
[ 13%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-aarch64.cpp.o
[ 15%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-traits.cpp.o
[ 15%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-hbm.cpp.o
[ 16%] Building C object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/ggml-cpu-quants.c.o
[ 16%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/amx/amx.cpp.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmv.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/arange.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cpy.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/acc.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/gla.cu.o
[ 28%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-wmma-f16.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/ggml-cuda.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/im2col.cu.o
[ 28%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/concat.cu.o
[ 28%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/getrows.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f32.cu.o
[ 30%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/binbcast.cu.o
[ 30%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/convert.cu.o
[ 32%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argmax.cu.o
[ 32%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/conv-transpose-1d.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/count-equal.cu.o
[ 33%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmq.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn.cu.o
[ 27%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o
[ 34%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/cross-entropy-loss.cu.o
[ 35%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/argsort.cu.o
[ 35%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/fattn-tile-f16.cu.o
[ 36%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/diagmask.cu.o
[ 37%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/mmvq.cu.o
[ 38%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/norm.cu.o
[ 39%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/opt-step-adamw.cu.o
[ 40%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/out-prod.cu.o
[ 41%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pad.cu.o
[ 42%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/pool2d.cu.o
[ 43%] Linking CXX static library libggml-cpu.a
[ 44%] Linking CXX static library libggml-blas.a
[ 44%] Built target ggml-cpu
[ 45%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/quantize.cu.o
[ 45%] Built target ggml-blas
[ 46%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/rope.cu.o
/var/www/stats/stable-diffusion.cpp/ggml/src/ggml-cuda/clamp.cu(11): error: more than one conversion function from "const half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/include/cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void op_clamp(const T *, T *, T, T, int) [with T=half]"
(17): here
            instantiation of "void clamp_cuda(const T *, T *, T, T, int, cudaStream_t) [with T=half]"
(37): here

/var/www/stats/stable-diffusion.cpp/ggml/src/ggml-cuda/clamp.cu(11): error: more than one conversion function from "const half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/include/cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void op_clamp(const T *, T *, T, T, int) [with T=half]"
(17): here
            instantiation of "void clamp_cuda(const T *, T *, T, T, int, cudaStream_t) [with T=half]"
(37): here

[ 46%] Built target zip
/var/www/stats/stable-diffusion.cpp/ggml/src/ggml-cuda/clamp.cu(11): error: more than one conversion function from "const half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/include/cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void op_clamp(const T *, T *, T, T, int) [with T=half]"
(17): here
            instantiation of "void clamp_cuda(const T *, T *, T, T, int, cudaStream_t) [with T=half]"
(37): here

/var/www/stats/stable-diffusion.cpp/ggml/src/ggml-cuda/clamp.cu(11): error: more than one conversion function from "const half" to a built-in type applies:
            function "__half::operator float() const"
/usr/local/cuda/include/cuda_fp16.hpp(204): here
            function "__half::operator short() const"
/usr/local/cuda/include/cuda_fp16.hpp(222): here
            function "__half::operator unsigned short() const"
/usr/local/cuda/include/cuda_fp16.hpp(225): here
            function "__half::operator int() const"
/usr/local/cuda/include/cuda_fp16.hpp(228): here
            function "__half::operator unsigned int() const"
/usr/local/cuda/include/cuda_fp16.hpp(231): here
            function "__half::operator long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(234): here
            function "__half::operator unsigned long long() const"
/usr/local/cuda/include/cuda_fp16.hpp(237): here
            function "__half::operator __nv_bool() const"
/usr/local/cuda/include/cuda_fp16.hpp(241): here
          detected during:
            instantiation of "void op_clamp(const T *, T *, T, T, int) [with T=half]"
(17): here
            instantiation of "void clamp_cuda(const T *, T *, T, T, int, cudaStream_t) [with T=half]"
(37): here

/var/www/stats/stable-diffusion.cpp/ggml/src/ggml-cuda/clamp.cu(11): error: ambiguous "?" operation: second operand of type "const half" can be converted to third operand type "<error-type>", and vice versa
          detected during:
            instantiation of "void op_clamp(const T *, T *, T, T, int) [with T=half]"
(17): here
            instantiation of "void clamp_cuda(const T *, T *, T, T, int, cudaStream_t) [with T=half]"
(37): here

[ 46%] Building CUDA object ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/scale.cu.o
5 errors detected in the compilation of "/var/www/stats/stable-diffusion.cpp/ggml/src/ggml-cuda/clamp.cu".
gmake[2]: *** [ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/build.make:147 : ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/clamp.cu.o] Erreur 1
gmake[2]: *** Attente des tâches non terminées....
gmake[1]: *** [CMakeFiles/Makefile2:402 : ggml/src/ggml-cuda/CMakeFiles/ggml-cuda.dir/all] Erreur 2
gmake: *** [Makefile:149 : all] Erreur 2
Apr 02 '25 14:04 ServeurpersoCom
OK I get the last ggml submodule from llama.cpp and CUDA build it work on stable-diffusion.cpp :) submodule on this project need update :)
Apr 02 '25 15:04 ServeurpersoCom
Now we net to checkout to 8b9cc7cdd8a0dcf0176c60c755322c95b5965299 to get the last working ggml because Georgi work on it
Apr 12 '25 14:04 ServeurpersoCom
any update on this? it seems you cant build with GCC14 and GCC15. Trying to workaround just throws more errors.
Jun 02 '25 05:06 Tamalero