chingfeng2021
chingfeng2021
During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./train_gpt_demo.py", line 353, in main() File "./train_gpt_demo.py", line 267, in main model = zero_model_wrapper(model, zero_stage,...
这个需要什么配置才能跑起来呢 @JThh
+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 531.18 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |...
@JThh 这是我目前的配置 2张显卡 ,3080 + 1070
我把batchsize都调的很小,还是说内存不够 export PLACEMENT=${PLACEMENT:-"cput"} ----> 这个参数是不是应该设置为cuda
@ht-zhou Could not find 'RANK' in the torch environment 这个策略需要多少的显存?我试了一下,发现了额这个报错 Traceback (most recent call last): File "train_prompts.py", line 122, in main(args) File "train_prompts.py", line 25, in main strategy = ColossalAIStrategy(stage=2,...
你参考的哪个教程
this is very good, pls merge it