Zhang Peiyuan comments

Results 32 comments of


                                            Zhang Peiyuan

Why is there a significant drop in `val_ppl` after fixing data-loading bug?

We sample [100 iters](https://github.com/jzhang38/TinyLlama/blob/c53075b679c9a97f96562052689e0043120a5fd5/pretrain/tinyllama.py#L38) of val data for the actual validation. I believe this is caused by the fact that a different partition of val data gets sampled after the...

The 4bit-quantized TinyLlama-1.1B's weight only takes up 550MB RAM ?

It actually takes up around 600MB on disk and around 700MB during inference, with activations taken into account (https://huggingface.co/TinyLlama/TinyLlama-1.1B-python-v0.1/blob/main/ggml-model-q4_0.gguf). I will update the readme.

The 4bit-quantized TinyLlama-1.1B's weight only takes up 550MB RAM ?

@TapendraBaduwal You can checkout llama.cpp

The 4bit-quantized TinyLlama-1.1B's weight only takes up 550MB RAM ?

@TapendraBaduwal I recommend you to check https://github.com/OpenAccess-AI-Collective/axolotl

Add support for sliding window context similar to Mistral?

I agree with @RonanKMcGovern here on the effectiveness of sliding window attention (even though I have not done an apple-to-apple comparison.) Would appreciate it if someone could submit a PR...

ncclRemoteError

Closing this issue for now.

3T checkpoint?

> Maybe you can try Mamba, rwkv and StripedHyena architecture? If we have the compute.

Chat v1.0 training recipe

Yes, we use the alignment handbook without changing any hyperparam.

How to compute metrics like ROUGE, BLEU in sft script?

https://github.com/EleutherAI/lm-evaluation-harness Probably this is the repo that you should look for.

Help me pls

Follow the instructions at https://github.com/Dao-AILab/flash-attention to install flash attn.