Allan Jie
Allan Jie
Probably `mkdir logs` to make a log folder
I'm also having an issue with OPT models, specifically Galactica. I will try to make an example for that
same issue here when loading very large model
same problem here
I simply change to VLLM.. sorry Microsoft :(
Thanks for the clarification. Now I understand the situation. One more question regarding the decoding procedure for Linear-chain CRF. 1. I figured out how to do argmax/Viterbi in O(log N)...
Thanks. That helps a lot. I think I need some more time to figure out the details of `.backward`. What I did at my side at the moment is to...
Sure. That would be great. One of my goals is exactly building these models incorporated with the current HF backend. CKY is something that I'm really looking forward to. Another...
BTW, I also found that Berg-Kirkpatrick et al. (2012) mentioned in the footnote, we can also use ```python3 delta (x_i) < 0 ``` besides ```python3 delta > 2*orig_delta ```
Facing the same issue with DeepSpeed 0.13.4. Training with PEFT: QLora + DeepSpeed Zero Stage 3, offload param and optimizer to CPU. Model: LLaMA2 Training is fine. After training, we...