Tobias Hinz
Results
2
issues of
Tobias Hinz
Hi, we are looking into training some transformer models with FP8 and we see a lot of overhead on the CPU side when te.Linear layers are scheduled in the forward...
performance
According to #438 we should be able to use both BF16 and FP8 autocasts. In our specific setting our module consists of some linear layers that are `torch.nn.Linear` and some...
bug