TransformerEngine icon indicating copy to clipboard operation
TransformerEngine copied to clipboard

Can we only replace part of nn.Linear with te.Linear and others keep unchanged?

Open zigzagcai opened this issue 8 months ago • 5 comments

zigzagcai avatar Mar 20 '25 11:03 zigzagcai

I'm not sure what you mean - if you want to run some Linear layers in fp8 and the rest in higher precision, or you want to run for example forward in fp8 and backward in high precision. Both of this scenarios will be possible when this PR will be merged (hopefully this week).

pggPL avatar Mar 20 '25 12:03 pggPL

I'm not sure what you mean - if you want to run some Linear layers in fp8 and the rest in higher precision, or you want to run for example forward in fp8 and backward in high precision. Both of this scenarios will be possible when this PR will be merged (hopefully this week).

Thank you! I mean run some layers in fp8 and other's in high precision.

zigzagcai avatar Mar 20 '25 14:03 zigzagcai

Yes, you can do that. You can either just leave some layers as nn.Linear or you can nest the fp8_autocast context manager, something like this:

with fp8_autocast(enabled=True):
    y = te_linear1(x)  # will compute in FP8
    with fp8_autocast(enabled=False):
        z = te_linear2(y)  # will compute in high precision

ptrendx avatar Mar 24 '25 16:03 ptrendx

Both of this scenarios will be possible when this https://github.com/NVIDIA/TransformerEngine/pull/1441 will be merged (hopefully this week).

Hi @pggPL , looks the original PR has been closed and split into 4 PRs. May i know when can we expected these changes been merged into TE?

lengerfulluse avatar Mar 27 '25 17:03 lengerfulluse

I want to merge them as soon as possible, there was temporal shortage of reviewers due to other deadlines with higher priority, but I hope it will be merged soon.

pggPL avatar Mar 31 '25 09:03 pggPL