Tingchen Fu
Tingchen Fu
I encounter a similar issue. I continually train a bloom-560m model and convert a saved checkpoint with the zero_to_fp32.py. But when I tried to reload the converted checkpoint without deepspeed:...
Thanks! @mayank31398 Sorry for the late response. I just tried your recipe and it works: ``` import torch from transformers import BloomForCausalLM,BloomConfig configuration = BloomConfig.from_pretrained('/apdcephfs/share_916081/tingchenfu/PLM/bloom-560m') model = BloomForCausalLM(configuration) reloaded =...
@Oxi84 Hi, so what is your learning rate? I use 1e-5 and meet a similar problem that the output of the tuned model is exactly the same as the original...
@sgugger It seems that `num_processes` is entangled with the number of GPU to use. If set to 1, only one GPU will be used when there are multiple GPUs and...
Hello, is there any solution now? @JosephChotard I meet with the same issue when loading BLOOM model. My transformers version is 4.31.0.dev0 and my bitsandbytes version is 0.39. Both are...
+1 我用baichuan-2-7b-base跑humaneval,pass@1的结果也只有三点多