Igor Kilbas
Igor Kilbas
> Is this faster than using mmap (see #91 / #150)? Just tested out #150 implementation, it was a lot slower than simple `fopen`. I believe #150 shoud've been faster...
@apaz-cli Thanks for the input. I initially thought the same because performance comparisons for `fstream vs fopen` I found were not really consistent (and very speculative) and depended on a...
@jart I was using the `mmap` branch. I believe I earlier tried #150 and it didn't work for me well. (1) I am on a Windows 11 machine, but building...
@jart I'd be glad to be of any help with debugging `mmap` for WIN32. What's needed right now is to clone `mmap` branch, (try to) compile it using MSVC and...
> @oKatanaaa do you still want to move forward on this PR? I'm still not convinced that c++ stl vs. c stdio is going to make a measurable difference, unless...
Also I think it's time to add commentaries for the code (mmap specifically), as it already becomes stupidly complicated and has a lot of magic constants. It simply becomes harder...
> > ... > > Again if the case is that more processes is what is wanted and an ability to share the state between them, a more general approach...
@nicknitewolf I looked into [mman-win32](https://github.com/alitrack/mman-win32) and tried to use the source code. Unfortunately it doesn't work. It is pretty similar to what @jart had already written but breaks without Justine's...
My findings so far: 1. In the PEFT model `lm_head` and `embed_tokens` have different data pointers (is that okay?). But the tensors are equal. 2. In the merged model data...
> OHHH I forgot to say `get_chat_template` is broken for Gemma :( What you're looking for is `"gemma_chatml"` instead of `"chatml"`, and it'll auto auto `` and ``. You also...