torchtitan
torchtitan copied to clipboard
Remove unnecessary .to() inside model forward
Stack from ghstack (oldest at bottom):
- #161
- -> #298
This appears to be a holdover from a previous way the initialization worked.
freqs_cis should already be on gpu device after initialization.
See this conversation for reference.