KyrieMing

Results 9 issues of KyrieMing

试了一下将苏神有监督训练的roformer从tf转到torch,苏神参数里面有pooler层,转的时候要过滤掉。

I am training a 10B model using deepspeed with megatron on A100 GPUS(80G). Here is my ds_report ![image](https://user-images.githubusercontent.com/56537141/219829623-362d40e8-dc52-4f41-8384-cf76807b5728.png) If I use 4 GPUS, the error is CUDA out of memory...

bug
training

I noticed that OIG dataset adds human and bot tag in each sample. In your code, you directly pack samples to max seq length and calculate cross entropy on whole...

@ymcui 您好,看到7B-Plus在Harmless上的表现极其的好。请问这部分数据的Prompt怎么得到的?Alpaca方式应该会被ChatGPT拒绝。是不是只能通过开源数据和人工整理,再问ChatGPT?

stale

Hello, I am excited that gpt-neox now support llama model. However, the script in tools/convert_raw_llama_weights_to_neox.py only support origin llama weight. Considering the large number of users currently using Huggingface, would...

feature request

**Describe the bug** I have trained a 1.3B model on 64 A100 80G Gpus, I export the saved checkpoints except the deepspeed zero-optimal states, the exported ckpts structure is same...

bug

Does ReplitLM support gradient checkpoints?