Tim Moon

Results 227 comments of Tim Moon

This PR introduced a bug in Thunder's TE executor, which is fixed with https://github.com/Lightning-AI/lightning-thunder/pull/2146 (see https://github.com/Lightning-AI/lightning-thunder/pull/2146#issuecomment-2915824016).

The tensor parallel group can be a subset of the world group. We frequently split the world group into orthogonal tensor-parallel, data-parallel, and pipeline-parallel groups. Based on the error message,...

The Thunder integration bug is fixed with https://github.com/Lightning-AI/lightning-thunder/pull/1826.

This looks like an import error, probably from Flash Attention. Our import logic has an unfortunate side effect of suppressing error messages (see https://github.com/NVIDIA/TransformerEngine/pull/862#pullrequestreview-2072546018), so can you try replacing `import...

Can you check if TE has built the required shared libraries? In particular, `/NeMo-Aligner/venv/lib/python3.10/site-packages/transformer_engine` should contain `libtransformer_engine.so` and something that looks like `transformer_engine_torch.cpython-310-x86_64-linux-gnu.so`. If your TE install has `libtransformer_engine.so` but...

This error message is hard for us to debug. As a convenience, the root `transformer_engine` package attempts to import the extensions for both PyTorch and JAX. However, it's unlikely that...

To use RMSNorm by itself, you can simply construct a [`te.RMSNorm`](https://github.com/NVIDIA/TransformerEngine/blob/744624d004f4514ffbaa90ac83e214311c86c607/transformer_engine/pytorch/module/rmsnorm.py#L89) module: ```python import torch import transformer_engine.pytorch as te # TE module layer = te.RMSNorm(128) # Synthetic data x =...

Can you share more information on your configuration, especially which DL framework you're building with? Passing the `--verbose` flag to `pip install` would also provide more useful build logs. A...

With https://github.com/NVIDIA/TransformerEngine/pull/987, you can control the number of parallel build jobs with the `MAX_JOBS` environment variable.

Hm, I'd expect most systems could handle building with `MAX_JOBS=1`. I wonder if we could get more clues if you build with verbose output (`pip install -v -v .`).