Andrej

Results 373 comments of Andrej

There is plan for fp8, but not int8 - that is usually used in context of inference, this repo focuses on training right now. We will very likely get around...

haha good idea actually thanks. Should we detect device type based on device though? Leaning to yes

I think this is part of a bigger refactor I'd like to make because I want both device and dtype to be configurable from args. And potentially apply the autocast...

CPU support is now added so closing this issue ty

Two options: - use `dtype=torch.float32` to disable mixed precision training. Will work on anything, but slow. - used `dtype=torch.float16` to use fp16 instead of bf16. Because the range of fp16...

agree, a good todo item

I benchmarked both and found the decorater to speed things up, but this was before I added torch.compile. Are you using torch.compile?

wow, with torch.compile I am seeing a big difference in the opposite direction, probably as you're seeing as well. with @torch.jit.script: 138ms / iter, without: 118ms / iter

got it, i'm struggling a bit with the best way to fix because i'd like to keep torch.compile optional for now. looks like we have to propagate whether `compile=True` through...

Agree it seems really gross to propagate and keep track of a boolean for whether I intend to later torch.compile the model. I do think there are legitimate uses for...