Zhang Yuchang

Results 5 comments of Zhang Yuchang

Hi, is there bias in your state_dict? Or your network purely uses weight matrices + activations?

Same issue Do you solve this problem?

Oh, okay. Maybe this is a direction worth trying. Have you also been researching this project recently? > Maybe mini-batch optimization + gradient accumulation will help? currently the code using...