gpt-fast
gpt-fast copied to clipboard
[WIP] Use DTensor-based tensor parallel
Stack from ghstack (oldest at bottom):
- -> #180
Status:
- Switched to DTensor based TP in regular tensor path
- Result is correct, but there is a perf gap (seems to perform extra colls in the beginning, investigating)
- TODO: switch to DTensor for quantized path too