Andrej comments

Results 373 comments of


                                            Andrej

Is there a plan to support 8bits (FP8 or INT8)?

There is plan for fp8, but not int8 - that is usually used in context of inference, this repo focuses on training right now. We will very likely get around...

cpu support

haha good idea actually thanks. Should we detect device type based on device though? Leaning to yes

cpu support

I think this is part of a bigger refactor I'd like to make because I want both device and dtype to be configurable from args. And potentially apply the autocast...

cpu support

CPU support is now added so closing this issue ty

Running train.py on 2060 GPU

Two options: - use `dtype=torch.float32` to disable mixed precision training. Will work on anything, but slow. - used `dtype=torch.float16` to use fp16 instead of bf16. Because the range of fp16...

Caching for generation

agree, a good todo item

Remove @torch.jit.script decorator when compiling the model?

I benchmarked both and found the decorater to speed things up, but this was before I added torch.compile. Are you using torch.compile?

Remove @torch.jit.script decorator when compiling the model?

wow, with torch.compile I am seeing a big difference in the opposite direction, probably as you're seeing as well. with @torch.jit.script: 138ms / iter, without: 118ms / iter

Remove @torch.jit.script decorator when compiling the model?

got it, i'm struggling a bit with the best way to fix because i'd like to keep torch.compile optional for now. looks like we have to propagate whether `compile=True` through...

Remove @torch.jit.script decorator when compiling the model?

Agree it seems really gross to propagate and keep track of a boolean for whether I intend to later torch.compile the model. I do think there are legitimate uses for...