Assi Jakoby

Results 8 comments of Assi Jakoby

Hi, Using https://github.com/microsoft/Megatron-DeepSpeed for pipeline + zero1 + bfloat16 Deepspeed doesn't work. When using the script in examples/run_deepspeed_example.sh with Zero1 and bfloat16 ( the script works with fp16) I get...

The code works with zero stage 0 , however I would like to use zero stage 1 in order to shard the optimizer states / calculations. When is pipeline +...

@baojianzhou I trained the model again with t2t-trainer having tf.disable_v2_behavior(), however the t2t-decoder still has issues. Can you please attach the files that you are using including the train command...

@wjm41 Thanks, are you using the t2t-trainer with --optionally_use_dist_strat=True ?

@wjm41 are you using t2t tag 1.15.7 as is with only the above 2 changes? are you doing training on multiple GPUs or 1 GPU? I'm working on multiple GPUs....

@PSZehnder Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...

@nshazeer Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...

@nshazeer , Thanks for your reply. If I can make the 16 GPUs visible ,How the data loading will be done in a 2 node * 8 GPUs ? Will...