Stas Bekman comments

Results 664 comments of


                                            Stas Bekman

Adjust MFU to account for FP8

Perhaps going forward it'd be the easiest to report tflops not as a single number? but something like `415(bf16) 490(fp8)` - then fp4, mxfp4, etc. can be added as well...

I personally don't use MFU as it's already a BS number since [100% is unachievable](https://github.com/stas00/ml-engineering/tree/master/compute/accelerator#maximum-achievable-flops) and moreover the achievable efficiency wildly varies between gpus. So if you move from B200...

Adjust MFU to account for FP8

yup, but as I suggested above it's probably best not to average since an important signal will be lost if you have one of the dtypes more inferior implementation than...

Adjust MFU to account for FP8

Yes, that's why I'm suggesting a breakdown report. Users should care a lot about reported TFLOPs/s and try to improve those. If they don't it will cost them $$ and...

Adjust MFU to account for FP8

tokens/s is also a very vague metric other than for local relative comparisons and even then one has to be very careful - this number alone is meaningless. Due to...

Does torchtune require specific torch version?

or even: ``` python -c "from torchtune.modules import TiedLinear" ``` which is what's really needed to use `torchtune.training._activation_offloading.OffloadActivations` I'm on pt-2.4 at the moment. At the very least torchtune could...

Does torchtune require specific torch version?

Understood! Would it too difficult to do a runtime check for pytorch version in `__init__.py` and tell the user if there is a mismatch and what exact version is required?...

Does torchtune require specific torch version?

The failing code because the version is wrong will not have a warning next to it explaining to the user what is wrong. Therefore an assert would be by far...

Does torchtune require specific torch version?

Oh, thank you very much for explaining, Joe. I first understood that the latest version of pytorch is required. Normally in all other frameworks I worked on we tested at...

[Bug]: Multiple inconsistencies wrt BOS injection and BOS duplication

> I will add that there is a per-request add_special_tokens parameter that can be used with both (3) and (4) which will control whether the BOS token is added I...