Igor Kilbas
Igor Kilbas
Tried to address slow weights loading. 7B is okay, but 13B is really slow (several minutes), hard to experiment/prototype with larger models. Replaced `std::ifstream` with C-style file reading using `fopen`....
I was tinkering with the code and made the following change in `line 977, main.cpp` (as it seemed wrong to me): *from* ```C if (embd.size() > params.n_batch) { break; }...
The code in PR lets you run llama the first time, but the second time the program crashes. This is due to memory access violation when trying to access any...
First of all, many thanks for your great work! Unsloth is amazing. --- Can't merge LoRA adapters into a Mistral model after training. Code: ```python model, tokenizer = FastLanguageModel.from_pretrained( model_name...
I've trained Gemma 2B in 16bit with LoRA. With adapters loaded separately everything works just fine. But after merging the adapters, the model becomes literally unusable.  On the screenshot:...