gpt4all
gpt4all copied to clipboard
Added cuda and opencl support
This PR aims to add support for CUDA and OpenCL. Once ready, I'll need someone to test CUDA support since I don't own an Nvidia card myself.
Testing instructions
Just a warning, old models as downloaded automatically will not work properly with OpenCL. Currently, it makes the GUI freeze, but that's some change on the GUI side that needs to be done. Old llama.cpp simply doesn't support them.
Download a GGML model from here: https://huggingface.co/TheBloke and place it in your models folder. Make sure it starts with ggml-
! The GUI might attempt to load another model on the way there and crash, since updating that won't be part of this PR. To prevent this, move the other models somewhere else.
To make the GUI actually use the GPU, you'll need to add either buildVariant = "cuda";
or buildVariant = "opencl";
after this line:
https://github.com/tuxifan/gpt4all/blob/dlopen_gpu/gpt4all-backend/llmodel.cpp#L69
We also need some people testing on Windows with AMD graphics cards! And some people on Linux testing on Nvidia.
This is a very important improvement but will have to be carefully tested.
We need to test that
- GPU support works on windows and linux machines with Nvidia graphics cards.
- Either the chat client or one set of bindings can effectively utilize the support.
I'll be happy to test it on Windows 10, maybe even Linux. NVidia 3060. Just ping me when you think it's in a good-enough state.
I'm happy to test as well. I have a windows machine with 3090
I'd be happy to test on my Windows 10 machine. Cuda is installed already.
edit: GPU is 1660 Ti
Wonderful! Thanks everyone :slightly_smiling_face:
Just a warning, old models as downloaded automatically will not work properly with OpenCL. Currently, it makes the GUI freeze, but that's some change on the GUI side that needs to be done. Old llama.cpp simply doesn't support them.
Here to help with testing on Windows 11, RTX 3090.
Here to help with testing on Windows 11, RTX 3060ti.Thanks everyone!
I've added testing instructions to the top post. :-)
Hello ! Thanks for the hard work.
I'm on Linux with iris xe integrated GPU (OpenCL compatible). Is there any chance of working ? I've forced "buildVariant = "opencl" in the code as specified above. Backend and chat built without any errors.
But when I launch "chat", it just stays forever without doing nothing (neither consuming CPU or RAM) with only a message "deserializing chats took: 0 ms"
I use the 13b snoopy model, it works perfectly on the main Nomic branch
OK I finally made it working !
First of all, I had openCL libs and headers but not CLBlast (I overlooked the cmake warning). I built it from there as the version include in my repos (Ubuntu 20.04) did not work : https://github.com/CNugteren/CLBlast
I also downloaded a new model (https://huggingface.co/TheBloke/samantha-13B-GGML/tree/main) as the snoozy one did not work (as you specified in the first message, sorry for having read too fast)
I now have a working OpenCL setup ! Hope it can helps others. But unfortunately it does not speed anything :D (my integrated GPU is probably not very suited for that).
Any idea about how I could speed that up ?
qt.dbus.integration: Could not connect "org.freedesktop.IBus" to globalEngineChanged(QString)
deserializing chats took: 0 ms
llama.cpp: loading model from /opt/gpt4all/gpt4allgpu//ggml-samantha-13b.ggmlv3.q5_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 5120
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 40
llama_model_load_internal: n_layer = 40
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 8 (mostly Q5_0)
llama_model_load_internal: n_ff = 13824
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 13B
llama_model_load_internal: ggml ctx size = 0,09 MB
ggml_opencl: selecting platform: 'Intel(R) OpenCL HD Graphics'
ggml_opencl: selecting device: 'Intel(R) Gen12LP HD Graphics NEO'
ggml_opencl: device FP16 support: true
llama_model_load_internal: mem required = 10583,26 MB (+ 1608,00 MB per state)
My GPU capabilities (using the OpenCL API) are below:
GPU VRAM Size: 25440 MB
Number of Compute Units: 96
Hello ! Thanks for the hard work.
I'm on Linux with iris xe integrated GPU (OpenCL compatible). Is there any chance of working ? I've forced "buildVariant = "opencl" in the code as specified above. Backend and chat built without any errors.
But when I launch "chat", it just stays forever without doing nothing (neither consuming CPU or RAM) with only a message "deserializing chats took: 0 ms"
I use the 13b snoopy model, it works perfectly on the main Nomic branch
Please reread the first message, the old models aren't supported :-)
Any idea about how I could speed that up ?
Nope. Integrated graphics are pretty much unsuitable for this. But this should be enough to show that it's working! Thank you for giving it a try :-)
Thank you for your answer that's what I thought :(. Just out of curiosity, what would be the limiting factor for such a iGPU : RAM because it's shared with the system (the GPU has indeed only 128 MB from what I understand) or just the number of computing cores ? Or something else ?
I did more tests and I notice something strange : I'm using intel_gpu_top to see the GPU usage. It's clearly used when I'm looking at a 4k 60 fps video on Youtube (=> hw acceleration), but it seems to be not used at all with GPT4All (GPU version). Do I miss something ?
Sorry, it's taking a bit longer. I hadn't actually compiled anything with MSVC in a while -- and it looks like that's the way to go -- so I'm now dealing with some fun build errors (who doesn't love those!). Although I've already knocked a few down by installing/upgrading certain things.
Question: are there known minimum requirements for the things involved in the whole toolchain?
I'm now up-to-date with some things and using VS Community 2022 and Windows 10 SDK v10.0.20348.0, so newer than that isn't possible anyway (for win10). Still relying on an older CUDA SDK (v11.6), however. Might just have to go update that, too, if nothing else helps.
I should probably go and have closer look at the llama.cpp project.
Yeah, CUDA setup should be documented in the llama.cpp
repo
I'm a bit reluctant to turn this into a troubleshooting session here -- in a pull request comment of all places -- but what I've seen so far might help others who want to try CUDA.
Well, it's quite weird with MSVC to say the least. So far I've run into the following problems. This was still before the forced pull/merge yesterday, which has helped quite a bit now:
-
Note: in all of the following, I've used
...\vcvarsall.bat x64
and I was trying to simply build the backend itself in a first step. I worked locally with agit fetch origin pull/746/head:trying-cuda; git checkout trying-cuda
. -
Some earlier problems got resolved by updating to Visual Studio 2022 and the latest Windows 10 SDK. I'm not going to go into detail about those. I'm still on CUDA v11.6, however. It doesn't seem to be a problem, after all.
-
many errors in
gpt4all-backend\llama.cpp-mainline\ggml-cuda.cu
with message:error : expected an expression
- Was a very puzzling error initially, because the
GGML_CUDA_DMMV_X
/GGML_CUDA_DMMV_Y
this pointed to were simple#define
s in the code. Turns out that for some reason, these#define
s are overridable throughcmake
compiler options and are actually set in the config -- only those settings were somehow not passed through in the end. Resolved by manually editing the relevant.vcxproj
file by changing all relevant compiler invocations. - Resolved. This doesn't happen anymore since the force push.
- Was a very puzzling error initially, because the
-
Warning about a feature requiring C++ standard 20.
- Fixed by editing
CMakeLists.txt
and replacingset(CMAKE_CXX_STANDARD 17)
withset(CMAKE_CXX_STANDARD 20)
- Resolved. This isn't necessary anymore since the force push/merge.
- Fixed by editing
-
minor problem:
warning C5102: ignoring invalid command-line macro definition '/arch:AVX2'
but/arch:AVX2
is a perfectly valid flag in MSVC.- I've figured out why it happens: it's following a
/D
, but is not about setting a macro definition. It's a valid flag by itself. Have not figured out why it's generated that way, though. - Doesn't occur when compiling the
main
branch, it seems? - Still happens after the force push/merge.
- I've figured out why it happens: it's following a
-
Main problem: Build errors in many projects:
error MSB3073: ... <many script lines omitted> ... :VCEnd" exited with code -1073741819.
- code -1073741819 is hexadecimal
0xC000 0005
which is seems to be the code for an access violation. Yikes. Did my compiler just crash? - Found this and this as potentially talking about the same problem. The former is a downvoted and unanswered SO question, and the latter says to disable the
/GL
compiler flag (not tried before the force push). - Still seeing these errors after the force push/merge.
- So far I did everything on the command line. This was somehow resolved by opening the
.sln
in Visual Studio and building the whole thing twice (after the first run showed the same errors). (???)
- code -1073741819 is hexadecimal
(Of course, I cannot exclude the possibility that all of this is yet another case of PEBKAC.)
=> So now I have managed to have a compiled backend, at last.
P.S. I could also try compiling everything with a MinGW setup (I prefer MSYS2 MinGW here). Is that something that's supposed to be supported in the future? I've invested quite some time to help troubleshoot problems there (mainly in 758, 717 and 710) and I guess it's not a good user experience -- but that also has to do with the Python bindings package.
Some compile issue on MSVC has been found and will be solved soon @cosmic-snow! Will notify you about more.
Oh really? That's good to know. But not urgent, because here's where I am now:
-
I tried compiling the backend by itself so I might get away with just testing through the Python bindings.
-
Turns out, the C API has changed, too. So I decided to finally do the full setup and download Qt Creator.
-
Some time and a few gigabytes later, it wasn't very hard to configure, most of the things were set correctly out of the box (I did have to compile this one twice, too, but that's a minor inconvenience). The only thing I changed was
CMAKE_GENERATOR
toVisual Studio 17 2022
: -
I already had prepared
mpt-7b-instruct.ggmlv3.q4_1.bin
which I renamedggml-mpt-7b-instruct.ggmlv3.q4_1.bin
, downloaded from: https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML/tree/main. This did not get recognised correctly:-
gptj_model_load: invalid model file ... (bad vocab size 2003 != 4096)
andGPT-J ERROR: failed to load model
although of course it's not a GPT-J model.
-
-
I then downloaded
Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin
renamed toggml-Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin
(https://huggingface.co/TheBloke/Wizard-Vicuna-7B-Uncensored-GGML/tree/main). And this works.
However, it did not seem to use my GPU, despite me setting it buildVariant = "cuda";
, so that's what I'm looking into at the moment.
Edit: It's clearly doing work in the cuda-enabled library (the cut-off name is ggml_graph_compute
):
Edit2: Added simple debugging printf
to line 9786 in ggml.c
and looks like the check ggml_cuda_can_mul_mat(...)
is simply never true in my case. Maybe I need a different model? But that's just a guess. To really understand what's going on I'd need to spend more time to understand llama.cpp.
Edit3: Added set(CMAKE_AUTOMOC OFF)
to the beginning of gpt4all-backend/CMakeLists.txt
. This makes it easier for me to understand the compilation output and should not mess up anything I think (but I'm no expert here). Aside: It'd probably be better to not globally set it ON in the chat CMakeLists.txt
, but only for the targets that actually use Qt. Might improve build speed slightly, too.
Edit4: One thing that feels odd is that the macro definition GGML_USE_CUBLAS
is only ever activated in the compiler options of ggml.c
, but llama.cpp
(the file, not the project) has an #ifdef
section depending on it. Talking about mainline here, but I think other targets have that, too.
@cosmic-snow thanks for the testing efforts!! Please note that MPT/GPT-J isn't supported in the new GGML formats yet. I have added the missing compile defines to the CMake file for llama, please try again now. :-)
I'm getting the error:
CMake Error at llama.cpp.cmake:280 (target_compile_definitions):
Cannot specify compile definitions for target "llama-230511-cuda" which is
not built by this project.
Call Stack (most recent call first):
CMakeLists.txt:90 (include_ggml)
Note: I just copied your most recent changes over, not going through Git. Not sure if that changed any line numbers, but the error should be clear: CUDA isn't present yet in that version.
I think I've seen some conditionals like that in CMakeLists.txt
. Maybe I can fix it myself.
Edit: I was mistaken, a previous build produced a llama-230511-cuda.dll
. Sorry, it's probably better to just start from a clean slate again.
Edit2: Trying again with a clean version of the patchset helped already, but now I'm getting the GGML_CUDA_DMMV_X
/GGML_CUDA_DMMV_Y
error again which I thought was resolved. Although I can see they're supposed to be defined in the cmake files -- in the compiler string for ...\llama.cpp-mainline\ggml-cuda.cu
they show up empty: ... -DGGML_CUDA_DMMV_X= -DGGML_CUDA_DMMV_Y= ...
. I'm starting to think it's something on my end I'm missing here.
Edit3: Maybe it's an ordering problem now in how the CMakeLists.txt
get read? Copying the following from ...\llama.cpp-mainline\CMakeLists.txt
to right before they're used in llama.cpp.cmake
fixed that particular error:
set(LLAMA_CUDA_DMMV_X "32" CACHE STRING "llama: x stride for dmmv CUDA kernels")
set(LLAMA_CUDA_DMMV_Y "1" CACHE STRING "llama: y block size for dmmv CUDA kernels")
if (GGML_CUBLAS_USE)
target_compile_definitions(ggml${SUFFIX} PRIVATE
GGML_USE_CUBLAS
GGML_CUDA_DMMV_X=${LLAMA_CUDA_DMMV_X}
GGML_CUDA_DMMV_Y=${LLAMA_CUDA_DMMV_Y})
...
Edit4: Something is still decidedly wrong here. I'm now getting a linker error (in short, it doesn't find the LLModel::construct()
symbol) when trying to build the chat application and that doesn't look like something that was even touched by your previous commit. I know where its implementation is, but somehow the llmodel.dll
just winds up empty now, inspecting it with 'DLL Export Viewer' at least says that. I have successfully built that on the main
branch yesterday and can see the symbol in that version's DLL.
I'll keep trying for a bit, but I guess I ultimately need to figure out what's wrong with the build process as a whole here.
I appologize, there was a little mistake in the llama.cpp.cmake
:-)
That should be solved now. Again, thanks a lot for testing all this!
That should be solved now. Again, thanks a lot for testing all this!
You're welcome. And yes, although I'm not going to pull those fixes again right now, that looks like it solves that particular problem.
In the meantime I've managed to get it to work somehow, although I don't understand it yet. And can confirm it was running on CUDA (still v11.6 instead of the latest v12.1), at least until it crashed:
Next, I guess I'll try to figure out:
- Build problems, esp.
error MSB3073
with code -1073741819 /0xC000 0005
, which seems to be the main culprit - the
/arch:AVX2
warning
Edit:
I think I've found the problem with the /arch:AVX2
. Here: https://github.com/nomic-ai/gpt4all/blob/e85908625f25190ad43f063979e0e95b889bc56b/gpt4all-backend/llama.cpp.cmake#L361-L363 it should be target_compile_options(...
instead of definitions(...
. I was looking at ...\llama.cpp*\CMakeLists.txt
this whole time, so it's no wonder I couldn't figure that one out.
Edit2:
Regarding the build problems, I've figured at least something out: If after compiling everything twice the llmodel.dll
ends up empty, manually opening its Visual Studio project, disabling /GL
(as mentioned above and recommended here) and recompiling it by itself fixes the problem.
Edit3:
Maybe also bump the version number? https://github.com/nomic-ai/gpt4all/blob/e85908625f25190ad43f063979e0e95b889bc56b/gpt4all-backend/CMakeLists.txt#L19-L21 The new C API is not compatible with the previous one, otherwise I could've just tested the backend with the Python bindings.
Edit4:
So I guess the /GL
setting was the problem in all the projects that failed with error MSB3073 ...
and had to be built twice. As a workaround, I've added set(IPO_SUPPORTED OFF)
right after the following: https://github.com/nomic-ai/gpt4all/blob/e85908625f25190ad43f063979e0e95b889bc56b/gpt4all-backend/CMakeLists.txt#L31-L38 Note: I'm not suggesting it should be turned off permanently for MSVC, maybe myself or someone else is able to figure out why it behaves like that and can come up with a proper fix. I did try with only set(LLAMA_LTO OFF)
at first, but that was not enough.
Has conflicts and is outdated. Should it be closed?
No! Back on track :-)
@cosmic-snow lots of stuff has happened, specially significantly, cmake fixes. I'd suggest trying again now, if you want :+1:
Alright.
I thought I'd do the standard thing I do these days when just updating main, which is making a backend build just by itself at first (from within MSYS2; MinGW64):
cd gpt4all-backend; mkdir build && cd build
cmake .. # then cmake --build .
This already failed, because the CUDA build doesn't work when I'm only running a MinGW build:
Details
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/include (found version "11.6.55")
CMake Error at C:/dev/env/msys64/mingw64/share/cmake/Modules/CMakeDetermineCompilerId.cmake:751 (message):
Compiling the CUDA compiler identification source file
"CMakeCUDACompilerId.cu" failed.
Compiler: C:/Program Files/NVIDIA GPU Computing
Toolkit/CUDA/v11.6/bin/nvcc.exe
Build flags:
Id flags: --keep;--keep-dir;tmp -v
The output was:
1
nvcc fatal : Cannot find compiler 'cl.exe' in PATH
Call Stack (most recent call first):
C:/dev/env/msys64/mingw64/share/cmake/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_CO
MPILER_ID_BUILD)
C:/dev/env/msys64/mingw64/share/cmake/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compi
ler_id_test)
C:/dev/env/msys64/mingw64/share/cmake/Modules/CMakeDetermineCUDACompiler.cmake:307 (CMAKE_DETERMIN
E_COMPILER_ID)
CMakeLists.txt:56 (enable_language)
-- Configuring incomplete, errors occurred!
Not a show-stopper, but something to keep in mind.
Then I did a regular build inside Qt Creator. It went without a problem and I could run it (but it only used the CPU).
After that, I edited the llmodel.cpp
source to add buildVariant = "cuda";
as required and rebuilt it again. That went fine, as well, but my NVIDIA GPU still showed no load after that. I thought the problem was that I tried with current Hermes. Selecting a different model folder (where I stored the model for the previous test) somehow didn't work, although the one I use normally already isn't standard. Finally, I moved the model over to the default folder -- but still didn't get any load on my GPU.
So that's where I've left it. Can't really say what needs to be done now to switch it to GPU. 🤔
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: AMD64
-- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.6/include (found version "11.6.55")
CMake Error at C:/dev/env/msys64/mingw64/share/cmake/Modules/CMakeDetermineCompilerId.cmake:751 (message):
Compiling the CUDA compiler identification source file
"CMakeCUDACompilerId.cu" failed.
set CUDATOOLKITDIR = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
Adding -D CMAKE_CUDA_COMPILER=$(which nvcc) to cmake fixed this for me: cmake . -D TCNN_CUDA_ARCHITECTURES=86 -D CMAKE_CUDA_COMPILER=$(which nvcc) -B build
Perhaps fixing the error should now fix this. Also it would be helpful if you would upload your file to latest changes, even if they do not work to here, so other people could have a look why it does not work.
I have the alleged fix from here: https://github.com/NVlabs/instant-ngp/issues/923
set CUDATOOLKITDIR = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1
Adding -D CMAKE_CUDA_COMPILER=$(which nvcc) to cmake fixed this for me: cmake . -D TCNN_CUDA_ARCHITECTURES=86 -D CMAKE_CUDA_COMPILER=$(which nvcc) -B build
Perhaps fixing the error should now fix this. Also it would be helpful if you would upload your file to latest changes, even if they do not work to here, so other people could have a look why it does not work.
@jensdraht1999 Thanks, for trying to help.
But I wasn't really looking for a fix for that. It was more meant as a note to what happens in a MinGW backend build. I'd expect that to just keep working as it is now if CUDA is not properly configured for it. (It's what's generally used for the bindings on Windows.)
What really matters at the moment is the Qt Creator build, which I've configured for CUDA and is using MSVC instead. That was the one through which I got CUDA working previously, and it already did with v11.6 instead of v12.1 of the CUDA toolkit, so there shouldn't be a need to upgrade.
I feel like it might even be an advantage to have confirmation on whether it functions with an older version for compatibility reasons.
Btw, what you can do to help is just try to build & run yourself on one or more platforms, then document whether it works or if you ran into trouble of some sort. That's all I'm doing at the moment, too.