SpecForge [Feature] Support FP8 target model during training

Checklist

[x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/SpecForge/discussions/new/choose Otherwise, it will be closed.
[x] 2. Please use English, otherwise it will be closed.

Motivation

please support FP8 target model during training to reduce GPU memory usage

Related resources

No response

Aug 07 '25 02:08 lmyybh

@zyksir has already supported this. I think the feature will be merged soon.

Aug 13 '25 21:08 yubofredwang

@yubofredwang Could you please point out which PR are you referring to if there is one?

Aug 28 '25 10:08 geaned

@yubofredwang I have the same issue. Could you please point it out? Current implementation doesn't support FP8(like finegrained_fp8) and simple type casting is not enough because the scale of activation is computed online. In distributed setting, each GPU has a different scale because they have different chunks of original data.

Oct 29 '25 04:10 Tomorrowdawn