direct-preference-optimization
direct-preference-optimization copied to clipboard
Qwen model issues & embedding and loss has nan
after a loss backward and optimizer step, then forward the embedding layer output hidden states become inf and loss is nan.