tortoise.cpp
tortoise.cpp copied to clipboard
latest ggml version sync
This is an attempt to rebase on the latest commit of ggml master branch. My primary goal behind it is to add Vulkan/OpenCL support as I only have AMD GPUs.
Tested and working well on CPU, but I don't have any nVidia card around so I cannot test CUDA backend.
This needs the following PR on ggml to work : https://github.com/balisujohn/ggml/pull/1
It'll be maybe 10 days before I can properly review this but it looks like really solid work and I will make sure it gets merged.
~One thing I'd like to preemptively request is an update to the readme listing the extent of support for Vulkan/OpenCL and adding compile instructions if any special instructions are necessary for those compile processes.~
Vulkan/OpenCL support would be extremely welcome but likely will require new implementations being added to ggml for some ops.
Agreed, other backends support should come in different PRs to avoid polluting this one. And will take me more time as it's the very first time I'm dealing with compute shaders and tensors.
Tell me if I'm wrong, but porting pad_reflect_1d, unfold_1d and conv_transpose_1d should be enough.
I expect it will be similar to the ops needed to add metal support, listed here https://github.com/balisujohn/tortoise.cpp/issues/14. But I haven't confirmed there aren't additional ops missing vulkan/opencl implementations.
Vulkan can support macos quite easily right?
I have isolated the divergence in behavior on CPU to this OP https://github.com/balisujohn/tortoise.cpp/blob/f21e5d53e26fd5774845b3f3438b4a7e482ab31c/main.cpp#L2601 for some reason this ggml_mul_mat op gives different outputs given the test input in the version of GGML in your commit and the version the master branch of tortoise.cpp currently points to. This could be related to a change to how GGML implements matrix multiplication. Our options are to either create different test cases for GPU and CPU, and change the CPU test cases to match the current CPU behavior,or to isolate the divergence in ggml_mul_mat behavior to vanilla GGML versions and try to get it fixed upstream. I lean somewhat towards creating separate CPU and GPU tests so we can keep development moving.
just to note, I'm also having a hard time porting it to Vulkan, after implementing the missing *_1d() functions, the output is total garbage (white noise + buzzing sound). Don't know if it could be related
Definitely the first thing I'd recommend trying is seeing if any of the tests pass with the vulkan process by leaving only 1 uncommented at a time, that could help isolate the divergence in the Vulkan process.
It'll be maybe 10 days before I can properly review this but it looks like really solid work and I will make sure it gets merged.
~One thing I'd like to preemptively request is an update to the readme listing the extent of support for Vulkan/OpenCL and adding compile instructions if any special instructions are necessary for those compile processes.~
Vulkan/OpenCL support would be extremely welcome but likely will require new implementations being added to ggml for some ops.
isn't it easier to upstream it to ggml?
Wow looking at this and quickly falling down a rabbithole of all the updates to ggml now, did something derail this merge?
If I recall correctly it was this: https://github.com/balisujohn/tortoise.cpp/pull/20#issuecomment-2308607816.