AlphaAtlas comments

Results 24 comments of


                                            AlphaAtlas

CPU offloading

The dream (for my hardware) is being able to split up the model between separate vulkan devices... maybe splitting up layers like llama.cpp does? This would allow for hybrid IGP+GPU...

"As an AI language model,"...

I'm sure people (like me) will be uploading all sorts of variants to HF once they enable the llama profiling in the build script. It could be done right now...

how to build mlc-llm-cli on Linux

There are a couple of things here - Their model uses a vulkan target, so you can’t use their HF weights. - I believe the CLI app targets Vulkan as...

Slow Vicuna vulkan performance with self-built build.py model.

I look forward to it :+1:. I saw those huge autogenerated files and thought it might already be part of the script.

which commit of relax should be used ?

Try the CPU version of pytorch. CUDA pytorch was making me segfault for some reason, and isnt needed.

[User] Segfault when saving session cache since ecb217d

Happening to me in linux as well, I think, via libllama.so from llama-cpp-python. ``` ... lama_print_timings: load time = 9348.15 ms llama_print_timings: sample time = 129.86 ms / 26 runs...

[User] Segfault when saving session cache since ecb217d

@ejones What lines did you comment out, exactly? I would be nice to work around this, since the commit is required for the new quant functionality.

[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors?

Is there *any* API or example code of how to use it out in the wild? I can't even find a whitepaper or anything like that.

[Feature request] Any plans for AMD XDNA AI Engine support on Ryzen 7x40 processors?

An issue is that inference either has to totally be on the XPU (excluding the possibility of partial OpenCL/CUDA acceleration), or support zero copy/unified memory to avoid cost prohibitive copies....

[Enhancement] Simultaneous CLBLAS/CUBLAS instances.

Also, while I am here, is simultaneous OpenBLAS/CUBLAS possible? I can't build with both at the same time, but it seems like OpenBLAS would be beneficial for CPU offloading unless...