candle icon indicating copy to clipboard operation
candle copied to clipboard

Implement DeepSeek V3/R1

Open EricLBuehler opened this issue 10 months ago • 5 comments

  • [x] Model
  • [x] FP8 weight dequantize
  • [x] Tensor parallelism

EricLBuehler avatar Jan 27 '25 18:01 EricLBuehler

Seems like most of this PR is adding the f8 dtype, perhaps worth separating that out from the deepseek model addition so its easier to review?

zackangelo avatar Feb 24 '25 21:02 zackangelo

this Would be great. And the F8 type is really helpful for other projects.

AlpineVibrations avatar Apr 23 '25 17:04 AlpineVibrations

it looks like the working branch is behind the main branch now. maybe the fp8 work should be pulled out. it looks like a lot of good work by @EricLBuehler

AlpineVibrations avatar Apr 23 '25 20:04 AlpineVibrations

Interesting idea. I'll take a look at doing that once a few other PRs are merged - specifically the Metal MM speedup one.

EricLBuehler avatar Apr 23 '25 21:04 EricLBuehler

Is this supposed to be able to support training?

AlbertMarashi avatar Apr 29 '25 10:04 AlbertMarashi