llama-dfdx
llama-dfdx copied to clipboard
LLaMa 7b with CUDA acceleration implemented in rust. Minimal GPU memory needed!
``` ubuntu@instance-20230508-1136:~/repos/llama-dfdx$ ./target/release/llama-dfdx --model llama-7b-hf --disable-cache generate "Why is pi round?" Detected model folder as LLaMa 7b. Model size: 13476 MB 13476 MB of model parameters will be held in...
Blocking questions: 1. Is it safe to std::mem::forget the tensor? Is that all we need to do? What about the other fields of tensor? 2. Is the Vec::from_raw_parts_mut usage safe?
- llama is generation, so can't really be used with chat - vicuna is a chatbot - alpaca is instruction model If not able to determine "mode" a user could...
Alpaca 7b should be the exact same structure, so as long as you can convert the weights into the same format with `convert.py` it should be runnable out of the...
Use cases: 1. You can fit the whole model into GPU ram 2. You can fit part of the model into GPU ram 3. You need keep all the model...