TransformerEngine
TransformerEngine copied to clipboard
Can we only replace part of nn.Linear with te.Linear and others keep unchanged?
I'm not sure what you mean - if you want to run some Linear layers in fp8 and the rest in higher precision, or you want to run for example forward in fp8 and backward in high precision. Both of this scenarios will be possible when this PR will be merged (hopefully this week).
I'm not sure what you mean - if you want to run some Linear layers in fp8 and the rest in higher precision, or you want to run for example forward in fp8 and backward in high precision. Both of this scenarios will be possible when this PR will be merged (hopefully this week).
Thank you! I mean run some layers in fp8 and other's in high precision.
Yes, you can do that. You can either just leave some layers as nn.Linear or you can nest the fp8_autocast context manager, something like this:
with fp8_autocast(enabled=True):
y = te_linear1(x) # will compute in FP8
with fp8_autocast(enabled=False):
z = te_linear2(y) # will compute in high precision
Both of this scenarios will be possible when this https://github.com/NVIDIA/TransformerEngine/pull/1441 will be merged (hopefully this week).
Hi @pggPL , looks the original PR has been closed and split into 4 PRs. May i know when can we expected these changes been merged into TE?
I want to merge them as soon as possible, there was temporal shortage of reviewers due to other deadlines with higher priority, but I hope it will be merged soon.