stable-diffusion.cpp icon indicating copy to clipboard operation
stable-diffusion.cpp copied to clipboard

feat: support mmap for model loading

Open wbruna opened this issue 2 weeks ago • 4 comments

Introduces a new --use-mmap flag that replaces model loading I/O operations with mmap + memcpy.

In my tests, this helps model loading speed slightly, though the gain was never higher than half a second. Its primary benefit right now is validation of the mmap backend implementation. Later, I plan to extend this to allow the mapped file to serve directly as weight storage for backends that use main memory.

I used a non-default flag to be extra safe, but we could arguably follow llama.cpp approach, with a --no-mmap flag to disable it instead.

I was only able to test (and build...) it under Linux, so additional testing is very welcome 🙂

wbruna avatar Dec 06 '25 19:12 wbruna

How much value would it be if llama.cpp exported the mmap stuff as a library?

Green-Sky avatar Dec 07 '25 11:12 Green-Sky

How much value would it be if llama.cpp exported the mmap stuff as a library?

I don't think it'd help that much right now. The mmap part itself is more-or-less straightforward; replacing the current alloc+memcpy code with a buffer managed externally will be much trickier.

wbruna avatar Dec 09 '25 01:12 wbruna

Have you experimented with MMaping then copying to GPU? In my experience. I've restricted MMapping only to CPU inference & loading. MMap -> copy to GPU became a bottleneck for some reason (I assume page size potentially?)

valkarias avatar Dec 10 '25 10:12 valkarias

Have you experimented with MMaping then copying to GPU? In my experience. I've restricted MMapping only to CPU inference & loading. MMap -> copy to GPU became a bottleneck for some reason (I assume page size potentially?)

Not yet. Right now I'm just reusing the I/O buffer; adding a separate code path to deliver the mapped area directly to the backend just to avoid a memcpy sounded like too much change for too little potential gain.

That behavior you describe sounds... odd. At least on Linux, large dynamically-allocated memory areas use mmap as backend anyway, so they should behave the same. Maybe it's a difference between file -backed and anonymous mappings.

wbruna avatar Dec 10 '25 12:12 wbruna