Marks101 issues

Results 10 issues of


                                            Marks101

Custom tensor roll kernel produces wrong results for variations of block size and tl.constexpr

Dear triton team, I am currently debugging an issue with a kernel that is supposed to replace torch.roll followed by zeroing out the first row of a 2D matrix. This...

[FeatureRequest] Collecting additional model outputs in `GenerationSession`

Hello team, We are training our GPT style models to produce additional outputs besides the logit tensor. One use case for us is token-wise classification. Another use case is attention...

feature request

High memory consumption for `ModelRunnerCpp` combined with `gather_all_token_logits`

### System Info - H100 DGX - CUDA 12.1 - TensorRT-LLM 0.10.0.dev2024041600 ### Who can help? @byshiue ### Information - [X] The official example scripts - [ ] My own...

bug

triaged

[PyTorch] Definition of attention mask

Hi team, I am wondering about the definition of the attention mask in transformer-engine. I did not find an explanation in the docs. Does True mean that the position takes...

[PyTorch] FlashAttention: causal masking enforced in cross attention due to sliding window attention

Hi transformer-engine team, we noticed that in a decoder layer with `self_attn_mask_type="causal"`, the following line https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/transformer.py#L594 sets window_size = (-1, 0). This window_size is passed to both self attention as...

[PyTorch] torch.compile causes _default_get_amax to produce wrong results

Hello, we noticed training instabilities in some of our FP8 trainings that use `transformer_engine.pytorch.Linear` to build a custom transformer stack. Looking into the checkpoints, we noticed that `scaling_bwd` deviates between...

[FeatureRequest] Gather sparse logprobs

Hello team, We typically use `gather_all_token_logits` to collect the logit tensors for post-processing. Especially for large vocabulary sizes (128 000) this can require a lot of GPU memory. For example,...

feature request

[PyTorch] Fix issues with cross attention

# Description Hello, we noticed a few issues for encoder decoder models with cross attention. I would like to suggest the following changes to fix them. ## Type of change...

[PyTorch] Bug in FP8 buffer update causing training instabilities

Hello team, we have been debugging large scale training instabilities with FP8 and noticed that these started when updating from transfomer-engine v1.2.1 to v1.7. Taking a closer look at the...

[PyTorch] FP8 and activation checkpointing causes training instabilities

Hello team, we noticed training instabilities when combining FP8 and activation checkpointing with `transformer_engine.pytorch.checkpoint`. When taking a closer look at this, we got the feeling that the FP8 scales in...