walton-wang929

Results 2 issues of walton-wang929

when I use multi-GPU training, I found GPU 0 used a log of memory. why? in this way, I cannot use a big batch size or large hidden size due...