KyrieMing issues

Results 9 issues of


KyrieMing

tf转torch model时要过滤掉pooler层

试了一下将苏神有监督训练的roformer从tf转到torch，苏神参数里面有pooler层，转的时候要过滤掉。

[BUG] Get "exits with return code = -9" when Creating fp16 ZeRO stage 2 optimizer

I am training a 10B model using deepspeed with megatron on A100 GPUS(80G). Here is my ds_report ![image](https://user-images.githubusercontent.com/56537141/219829623-362d40e8-dc52-4f41-8384-cf76807b5728.png) If I use 4 GPUS, the error is CUDA out of memory...

bug

training

Why instruction tuning calculate whole sentence loss?

I noticed that OIG dataset adds human and bot tag in each sample. In your code, you directly pack samples to max seq length and calculate cross entropy on whole...

请问Harmless的数据如何获取？

@ymcui 您好，看到7B-Plus在Harmless上的表现极其的好。请问这部分数据的Prompt怎么得到的？Alpaca方式应该会被ChatGPT拒绝。是不是只能通过开源数据和人工整理，再问ChatGPT？

stale

请问无害性数据是如何生成的？类似Alpaca的方式会被ChatGPT拒绝回答

Convert HF Llama Checkpoints to Neox Checkpoints

Hello, I am excited that gpt-neox now support llama model. However, the script in tools/convert_raw_llama_weights_to_neox.py only support origin llama weight. Considering the large number of users currently using Huggingface, would...

feature request

Finetuning loss explode when not loading deepspeed zero optimal states

**Describe the bug** I have trained a 1.3B model on 64 A100 80G Gpus, I export the saved checkpoints except the deepspeed zero-optimal states, the exported ckpts structure is same...

bug

5月11日开源的3.5M数据集和之前的1M、2M、generate_chat、school math是什么关系？是之前开放数据集的合集还是完全独立的？

Does ReplitLM support gradient checkpoints?