chingfeng2021

Results 14 comments of chingfeng2021

During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./train_gpt_demo.py", line 353, in main() File "./train_gpt_demo.py", line 267, in main model = zero_model_wrapper(model, zero_stage,...

+---------------------------------------------------------------------------------------+ | NVIDIA-SMI 530.30.02 Driver Version: 531.18 CUDA Version: 12.1 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage |...

@JThh 这是我目前的配置 2张显卡 ,3080 + 1070

我把batchsize都调的很小,还是说内存不够 export PLACEMENT=${PLACEMENT:-"cput"} ----> 这个参数是不是应该设置为cuda

@ht-zhou Could not find 'RANK' in the torch environment 这个策略需要多少的显存?我试了一下,发现了额这个报错 Traceback (most recent call last): File "train_prompts.py", line 122, in main(args) File "train_prompts.py", line 25, in main strategy = ColossalAIStrategy(stage=2,...

你参考的哪个教程