Assi Jakoby comments

Results 8 comments of


                                            Assi Jakoby

Add explicit gradient_accumulation_dtype config

Hi, Using https://github.com/microsoft/Megatron-DeepSpeed for pipeline + zero1 + bfloat16 Deepspeed doesn't work. When using the script in examples/run_deepspeed_example.sh with Zero1 and bfloat16 ( the script works with fp16) I get...

Add explicit gradient_accumulation_dtype config

The code works with zero stage 0 , however I would like to use zero stage 1 in order to shard the optimizer states / calculations. When is pipeline +...

T2T 1.15.7 version with Tensorflow 2.2 - t2t-decoder doesn't run

@baojianzhou I trained the model again with t2t-trainer having tf.disable_v2_behavior(), however the t2t-decoder still has issues. Can you please attach the files that you are using including the train command...

T2T 1.15.7 version with Tensorflow 2.2 - t2t-decoder doesn't run

@wjm41 Thanks, are you using the t2t-trainer with --optionally_use_dist_strat=True ?

T2T 1.15.7 version with Tensorflow 2.2 - t2t-decoder doesn't run

@wjm41 are you using t2t tag 1.15.7 as is with only the above 2 changes? are you doing training on multiple GPUs or 1 GPU? I'm working on multiple GPUs....

Performance on GPUs and multiple GPU support

@PSZehnder Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...

Performance on GPUs and multiple GPU support

@nshazeer Does mesh tensorflow supports multi node training ( i.e. each node has #x GPUs attached to it)? I'm using 2 nodes each with 8 GPUs and would like to...

Performance on GPUs and multiple GPU support

@nshazeer , Thanks for your reply. If I can make the 16 GPUs visible ,How the data loading will be done in a 2 node * 8 GPUs ? Will...