TransformerEngine issues

3

Hey, I'm using the `te_gemm` function defined in the PyTorch extensions [here](https://github.com/cli99/TransformerEngine/blob/6b21f606f2459d49c2113d69236d68d334edeb4c/transformer_engine/pytorch/csrc/extensions/gemm.cu#L10), and I'm trying to apply a scaling factor to the output. My gemm inputs are in fp8e4m3 and...

snarayan21

question

[ERROR] cannot install the package,

1

The CMAKE configuration failed with the following error. The same error is observed in both *stable* and *main* branch. ```text -- JAX support: OFF -- Configuring done CMake Error at...

xju2

bug

build

te.Checkpoint does not work for nested autocast

3

According to #438 we should be able to use both BF16 and FP8 autocasts. In our specific setting our module consists of some linear layers that are `torch.nn.Linear` and some...

tohinz

bug

[Paddle] Add main_grad

3

Support main_grad and fuse_wgrad_accumulation

Wong4j

Replacing nn.Linear w/ te.Linear FP8 convergence issue

9

Hi, I'm seeing higher losses using `te.Linear` over `nn.Linear` directly in transformer models such as Llama which I assume is expected due to the nature of FP8. However, I don't...

viclzhu

PyTorch 2.2.0 NVFuser deprecation is incompatible with TransformerEngine.

3

In recent PyTorch 2.2.0 release, they have deprecated NVFuser in torch script with this [warning](https://github.com/pytorch/pytorch/blob/v2.2.0/torch/csrc/jit/python/init.cpp#L759-L762). See this [commit](https://github.com/pytorch/pytorch/commit/e6b5e0ecc609c15bfee5b383fe5c55fbdfda68ff). We are running into tests failure on TransformerEngine when running the following...

sirutBuasai

bug

When ub_overlap_rs_dgrad is set to True, the error "Caught signal 8 (Floating point exception: integer divide by zero)" is raised.

2

Setting ub_overlap_rs_dgrad to True in megatron-LM will raise "Caught signal 8 (Floating point exception: integer divide by zero) "error, which was eventually found to be caused by a problem with...

JJGSBGQ

bug

[PyTorch] Re-enable bias+GELU fusion for non-reentrant checkpointing -- WIP

1

TorchDynamo has known limitations for `autograd.Function` implementations and `autograd.graph` hooks. Activation recompute utilizes *both* of those mechanisms, so this PR disables TorchDynamo on `te.distributed.checkpoint()` via the `@no_torch_dynamo()` decorator.

denera

[PyTorch] Fix minor bug in computing num_gqa_groups_per_partition

3

Currently the number of GQA groups per partition for DotProductAttention is computed using a default value, rather than the actual value computed earlier in the initializer, causing errors when tensor...

knowlsie

bug

TransformerEngine
TransformerEngine copied to clipboard

Metadata

Bug fix in DGRAD->RS overlap

Output scale not being used with `te_gemm` in FP8

[ERROR] cannot install the package,

te.Checkpoint does not work for nested autocast

[Paddle] Add main_grad

Replacing nn.Linear w/ te.Linear FP8 convergence issue

PyTorch 2.2.0 NVFuser deprecation is incompatible with TransformerEngine.

When ub_overlap_rs_dgrad is set to True, the error "Caught signal 8 (Floating point exception: integer divide by zero)" is raised.

[PyTorch] Re-enable bias+GELU fusion for non-reentrant checkpointing -- WIP

[PyTorch] Fix minor bug in computing num_gqa_groups_per_partition

← Metadata

Owner

Metadata

TransformerEngine TransformerEngine copied to clipboard

Metadata

← Metadata

Owner

Metadata

TransformerEngine
TransformerEngine copied to clipboard