Allan Jie comments

Results 72 comments of


                                            Allan Jie

I don't know how to run your code

Probably `mkdir logs` to make a log folder

[BUG] DeepSpeed Inference with GPT-J using batches with padding gives wrong outputs

I'm also having an issue with OPT models, specifically Galactica. I will try to make an example for that

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 2762685)

same issue here when loading very large model

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

same problem here

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_linearRN2at6TensorES1_S1_S1_S1_S1_iiii

I simply change to VLLM.. sorry Microsoft :(

Question on the Complexity of CKY

Thanks for the clarification. Now I understand the situation. One more question regarding the decoding procedure for Linear-chain CRF. 1. I figured out how to do argmax/Viterbi in O(log N)...

Question on the Complexity of CKY

Thanks. That helps a lot. I think I need some more time to figure out the details of `.backward`. What I did at my side at the moment is to...

Question on the Complexity of CKY

Sure. That would be great. One of my goals is exactly building these models incorporated with the current HF backend. CKY is something that I'm really looking forward to. Another...

Possible bug

BTW, I also found that Berg-Kirkpatrick et al. (2012) mentioned in the footnote, we can also use ```python3 delta (x_i) < 0 ``` besides ```python3 delta > 2*orig_delta ```

RuntimeError: 'weight' must be 2-D while training Flan-T5 models with stage 3

Facing the same issue with DeepSpeed 0.13.4. Training with PEFT: QLora + DeepSpeed Zero Stage 3, offload param and optimizer to CPU. Model: LLaMA2 Training is fine. After training, we...