flux-fp8-api icon indicating copy to clipboard operation
flux-fp8-api copied to clipboard

reproductibility

Open christopher5106 opened this issue 10 months ago • 7 comments

Hi, I'm trying to achieve reproductibility with your fp8 quantization: half of the time, i'm not getting the expected image. When I deactivate quantization, i have some reproductible results. I would like to know if you have any idea what could be the reason ? It's super strange since I set the seed for everything, random package, torch , etc thank you for your help

christopher5106 avatar Feb 25 '25 18:02 christopher5106

Well float8 does effect precision, and there will be added error so you won't get the same image as you would if it was not quantized. If you're loading the model multiple times, it's possible that there is a slightly different scale found for the float8 scaling term, since the api will run a generation to initialize the float8 scales, which would probably result in slightly different images each time you start the api again, especially if you're not using compile. After you've generated ~2 images it should become more reproducible (with the loaded model itself, not any previous time you've run the api).

aredden avatar Feb 25 '25 21:02 aredden

Thanks @aredden for your explanation, but look, first I'm setting all seeds for all generators (random, torch, etc) as I'm usually doing for diffusers. Then I run a first inference, always the same, then I optionally quantize and compile, then run an inference to have the compilation effective, then starts the inferences.

Without quantization, the inferences give always the same results. With quantization, it's giving the same series of images about every two times approximately. How is it possible that a completely seeded python program gives images that show strong differences from expected ones ?

christopher5106 avatar Feb 25 '25 21:02 christopher5106

I found. This is due to fast acc. Thx!

christopher5106 avatar Feb 25 '25 22:02 christopher5106

I have reproductibility now on H100 but still not on H200, that's why I'm opening again the issue.

christopher5106 avatar Feb 26 '25 09:02 christopher5106

Sounds like h100 and h200 have slightly different architectures and implementations ?

christopher5106 avatar Feb 26 '25 09:02 christopher5106

Could you please share the config you used to reproduce these benchmarks on an H100? I haven't found these anywhere

baptistecumin avatar Feb 28 '25 14:02 baptistecumin

Sorry I'm not sure to understand the question if it's for me. I'm running inferences with quantized flux schnell model and I have different expected images when scaled mm with accumulator.

christopher5106 avatar Mar 07 '25 20:03 christopher5106