stanford_alpaca train using 2 nodes is slower than 1 node

train using 2 nodes is slower than 1 node

Open hujunchao opened this issue 2 years ago • 2 comments

when I use two A100 nodes, each node is (80GX8). I found two nodes train is slower than one node. I use torchrun xxx. can any one meet this?

Apr 11 '23 00:04 hujunchao

I am getting this error https://github.com/tatsu-lab/stanford_alpaca/issues/189#issue-1658173995 using single node. any idea whats the problem?

Apr 11 '23 16:04 Ahtesham00

I am sorry. I use the default params and don't meet it.

Apr 12 '23 01:04 hujunchao