apaz comments

Results 25 comments of


                                            apaz

trafficstars

Should use `mmap` for model loading

I'm getting ready to take another swing at it. My idea of what to do so far: 1. Create functions in `utils.h` called `llama_load_buffer()`, `llama_save_buffer()`, and `llama_destroy_buffer()`. These will `mmap()`...

Should use `mmap` for model loading

@jart It would double the disk usage, yes. But so does converting the model, and so does quantizing it. I think people are prepared for this. You're right though in...

Should use `mmap` for model loading

@jart I have no idea how to support that in a portable way. I haven't dug too deep into it. I'm halfway through implementing part 1. The troubling thing is...

Should use `mmap` for model loading

@jart I'm more lamenting at the absurdity that there's no portable (C++11) way to find the size of a file. It truly baffles me. On posix there's `fstat`. On Windows...

Keeping the model loaded on RAM

The `mmap()`/`mlock()` changes in llama.cpp should be applicable here.

error building on linux

It would be best to take this up on https://github.com/ggerganov/ggml.

Making weights loading faster

@oKatanaaa switching between `std::ifstream` and `FILE*` should make no measurable difference. They are both tunable, do conceptually the exact same thing, and support (almost) exactly the same set of operations....

Making weights loading faster

@oKatanaaa The branch is already in the repo. Just `git pull origin` and `git checkout mmap`.

Fixed rlimit error message.

Any updates or feedback on this @ggerganov?

Add MoE layer example

Do we have any broader ideas for how this fits into the strategy for handling dynamic and data dependent shapes? I was under the impression that this was just something...