Daniel Han
Daniel Han
Update: Hi so I managed to test HF -> llama.cpp without Unsloth to remove Unsloth from the picture. 1. '\n\n' is tokenized as [1734, 1734], unless if I prompted it...
It should be fixed!
Currently we do not support multi GPU - use our Kaggle Llama-3 notebook: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-unsloth-notebook as is - it does not work on 2x T4s
Oh 2x Tesla T4 is the option!
Just select it
Oh :( Sorry on the issue
Ok great! sorry on the issue!
Oh great you solved it!
Ye sadly not 128K yet - on the roadmap though! It's sadly not RoPE but some other scaling mechanism
Oh `adapter_config.json` is the `config.json` equivalent. If you're looking for Ooba inference or GGUF, please use our saving to 16bit instead