Sean Owen
Sean Owen
bf16 should still be used instead of fp16; it doesn't increase mem usage. OK, it's possible you do need more memory if your input tokenizes differently and you have long...
Right, does not make sense to fine-tune Pythia, either; it was pre-trained on the Pile, which is mostly English text.
Yes, although, this model was derived from Pythia, not LLaMa.
Hm, never seen that one. I am not sure if the `nvidia-cublas-cu11` python packages you installed are what you want for the nvidia libs. have you installed the OS libs...
Do you have the python3-dev OS package installed?
I've seen a failure to build fused_adam because of missing libraries, but yours is different than what I've seen and that issue. You're somehow missing the python dev headers. Maybe...
Wrong torch version - use the versions in requirements.txt. It hints at this: [WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.1
What GPU are you running on? eos_token looks wrong in your last snippet. Can you just use the provided pipeline? I don't think you need an attention mask either. This...
In LoRA, you save the adapter. Then load the base model, then load the adapter after supplying the base model. See the docs and examples on HF. LoRA does not...
Hm, what's this showing? I am not sure why the second didn't do shell escaping, but I think your token doesn't appear possibly because it was redacted? would be if...