Zhang Peiyuan
Zhang Peiyuan
We sample [100 iters](https://github.com/jzhang38/TinyLlama/blob/c53075b679c9a97f96562052689e0043120a5fd5/pretrain/tinyllama.py#L38) of val data for the actual validation. I believe this is caused by the fact that a different partition of val data gets sampled after the...
It actually takes up around 600MB on disk and around 700MB during inference, with activations taken into account (https://huggingface.co/TinyLlama/TinyLlama-1.1B-python-v0.1/blob/main/ggml-model-q4_0.gguf). I will update the readme.
@TapendraBaduwal You can checkout llama.cpp
@TapendraBaduwal I recommend you to check https://github.com/OpenAccess-AI-Collective/axolotl
I agree with @RonanKMcGovern here on the effectiveness of sliding window attention (even though I have not done an apple-to-apple comparison.) Would appreciate it if someone could submit a PR...
Closing this issue for now.
> Maybe you can try Mamba, rwkv and StripedHyena architecture? If we have the compute.
Yes, we use the alignment handbook without changing any hyperparam.
https://github.com/EleutherAI/lm-evaluation-harness Probably this is the repo that you should look for.
Follow the instructions at https://github.com/Dao-AILab/flash-attention to install flash attn.