孙铭声

Results 4 comments of 孙铭声

您好,我阿里三面技术面+一面交叉面之后又加了一面技术面,请问,这是因为对我的水平有质疑才增加的嘛

I only use moe_layer funtion in my code and the parameters of all experts are same. dist_rank=2,my cuda version is 10.1 and pytorch version is 1.8 `self.ffn_text = tutel_moe.moe_layer( gate_type={'type':...

i have change my DistributedDataParallel to torch.nn.parallel.DistributedDataParallel, and the problem have been solved, but there have i new error.... `Traceback (most recent call last): File "Train_tutel.py", line 354, in main()...

when i set find_unused_parameters=True, all parameters of the experts will not be updated