gpt-fast
gpt-fast copied to clipboard

Published 20 hours ago •

Reame
Issues

[WIP] Use DTensor-based tensor parallel

Open kwen2501 opened this issue 1 year ago • 0 comments

Stack from ghstack (oldest at bottom):

-> #180

Status:

Switched to DTensor based TP in regular tensor path
Result is correct, but there is a perf gap (seems to perform extra colls in the beginning, investigating)
TODO: switch to DTensor for quantized path too

Jun 12 '24 20:06 kwen2501