llamafile
llamafile copied to clipboard
Added the ability to use LLAMA_HIP_UMA
With AMD APU (like my Ryzen 7940HX) it is possible to use "UMA" to extand VRAM. And in my case I can't alloc more than 4Go of VRAM (bios config).
And with this (https://github.com/ggerganov/llama.cpp/issues/7399) it may be as fast as with VRAM (I can't do a full test because I can't allocate more than 4Go of VRAM with my config)
I can (:crossed_fingers: ) make a PR here but need to know what the best is to made it available.
- a runtime option
- a failback alloc
- a default on some hardware...