孙铭声 comments

Repositories
Issues
Comments

Results 4 comments of


孙铭声

我在阿里做技术面试官的一些经验

您好，我阿里三面技术面+一面交叉面之后又加了一面技术面，请问，这是因为对我的水平有质疑才增加的嘛

how to save checkpoint when use data parallel and moe expert

I only use moe_layer funtion in my code and the parameters of all experts are same. dist_rank=2,my cuda version is 10.1 and pytorch version is 1.8 `self.ffn_text = tutel_moe.moe_layer( gate_type={'type':...

how to save checkpoint when use data parallel and moe expert

i have change my DistributedDataParallel to torch.nn.parallel.DistributedDataParallel, and the problem have been solved, but there have i new error.... `Traceback (most recent call last): File "Train_tutel.py", line 354, in main()...

how to save checkpoint when use data parallel and moe expert

when i set find_unused_parameters=True, all parameters of the experts will not be updated