AlphaAtlas
AlphaAtlas
The dream (for my hardware) is being able to split up the model between separate vulkan devices... maybe splitting up layers like llama.cpp does? This would allow for hybrid IGP+GPU...
I'm sure people (like me) will be uploading all sorts of variants to HF once they enable the llama profiling in the build script. It could be done right now...
There are a couple of things here - Their model uses a vulkan target, so you can’t use their HF weights. - I believe the CLI app targets Vulkan as...
I look forward to it :+1:. I saw those huge autogenerated files and thought it might already be part of the script.
Try the CPU version of pytorch. CUDA pytorch was making me segfault for some reason, and isnt needed.
Happening to me in linux as well, I think, via libllama.so from llama-cpp-python. ``` ... lama_print_timings: load time = 9348.15 ms llama_print_timings: sample time = 129.86 ms / 26 runs...
@ejones What lines did you comment out, exactly? I would be nice to work around this, since the commit is required for the new quant functionality.
Is there *any* API or example code of how to use it out in the wild? I can't even find a whitepaper or anything like that.
An issue is that inference either has to totally be on the XPU (excluding the possibility of partial OpenCL/CUDA acceleration), or support zero copy/unified memory to avoid cost prohibitive copies....
Also, while I am here, is simultaneous OpenBLAS/CUBLAS possible? I can't build with both at the same time, but it seems like OpenBLAS would be beneficial for CPU offloading unless...