Ledzy
Ledzy
> Maybe u can the the orignal papper: > > > all the network parameters are shared between the source domain and target domain data other than those of the...
> Hi, I don't think I have added a domain-specific BN layer to the last FC in my implementation. Could you point it to me if possible? Thanks for your...
> > > Maybe u can the the orignal papper: > > > > all the network parameters are shared between the source domain and target domain data other than...
Same problem for me. Actually i found the parameter "FILTER_THRESHOLD" in the config file makes big difference. According to what i found, set it to 0.05 would increase about 2%...
Hi @Jiminator , as only 1000 samples are used and batch size is 8, setting "badam_switch_interval=50" will only update 8 blocks (1000\*3/(8\*50)=7.5), while Llama 3-8B has 32 block when using...
Thank you for your efforts in integrating our work! BAdam supports model parallel offered by Deepspeed ZeRO-3 now. Integrating the latest feature only requires * set `ds_zero3_enabled=True` when initializing the...