Mozhi Zhang
Mozhi Zhang
The above error message was from fairscale 0.3.7. Also tried fairscale 0.4.6 and got a similar error: ``` 2022-05-08 14:35:33,820 INFO | training... rank: 3 | 2022-05-08 14:35:40,808 CRITICAL |...
No. We just tried running this (fairscale 0.4.6): `parlai multiprocessing_train -t projects.seeker.tasks.knowledge,projects.seeker.tasks.dialogue,projects.seeker.tasks.search_query --multitask-weights 2,2,1 -veps 0.25 --attention-dropout 0.0 --batchsize 32 --model transformer/generator --embedding-size 2560 --ffn-size 10240 --variant prelayernorm --n-heads 32...
Sorry, it turns out that transformer/generator works fine with bs > 1. We ran into the above error because we turned off flatten_parameter (which is also strange, but I suppose...
So we just tried training Seeker with fairscale 0.4.4 and got the same error.