fpcsong
Results
3
comments of
fpcsong
It does not crash directly, but it create multiple processes on cuda 0.
It is our internal tool-kits and is adapted to many transformer based models.The script ``` deepspeed --num_gpus 8 benchmark.py \ -it \ -t_data $TRAINDATA \ -te \ -v_data $EVALDATA \...
It is our internal tool-kits. In short, can you please provide your version of cuda, torch, deepspeed, flash_attn, xformers, and other key packages.