candle Implement DeepSeek V3/R1

Implement DeepSeek V3/R1

Open EricLBuehler opened this issue 10 months ago • 5 comments

[x] Model
[x] FP8 weight dequantize
[x] Tensor parallelism

Jan 27 '25 18:01 EricLBuehler

Seems like most of this PR is adding the f8 dtype, perhaps worth separating that out from the deepseek model addition so its easier to review?

Feb 24 '25 21:02 zackangelo

this Would be great. And the F8 type is really helpful for other projects.

Apr 23 '25 17:04 AlpineVibrations

it looks like the working branch is behind the main branch now. maybe the fp8 work should be pulled out. it looks like a lot of good work by @EricLBuehler

Apr 23 '25 20:04 AlpineVibrations

Interesting idea. I'll take a look at doing that once a few other PRs are merged - specifically the Metal MM speedup one.

Apr 23 '25 21:04 EricLBuehler

Is this supposed to be able to support training?

Apr 29 '25 10:04 AlbertMarashi

candle candle copied to clipboard

Implement DeepSeek V3/R1

candle
candle copied to clipboard