flash-linear-attention [Bug] Throughput benchmarking script fails

[Bug] Throughput benchmarking script fails

Open sustcsonglin opened this issue 6 months ago • 1 comments

Checklist

[x] I have checked FAQs and existing issues for similar problems
[x] My GPU is H100 and I have installed triton-nightly built by fla team, and double checked FAQs
[x] Please report this bug in English to ensure wider understanding and support

Describe the Bug

        ^^^^^^^^^^^^^^^^^^^^^

File "/data/cl/user/yangsl66/miniconda3/envs/fla/lib/python3.12/site-packages/fla/models/transformer/modeling_transformer.py", line 374, in forward logits = None if fuse_linear_and_cross_entropy else self.lm_head(hidden_states[:, -logits_to_keep:]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Boolean value of Tensor with more than one value is ambiguous

Steps to Reproduce the Bug

N/A

Expected Behavior

N/A

Environment Information

Torch:
Triton:

May 16 '25 11:05 sustcsonglin

Maybe related to https://github.com/fla-org/flash-linear-attention/pull/401

May 16 '25 21:05 zhiyuan1i

This issue is stale because it has been open for 30 days with no activity.

Jun 22 '25 00:06 github-actions[bot]

flash-linear-attention flash-linear-attention copied to clipboard

[Bug] Throughput benchmarking script fails

Checklist

Describe the Bug

Steps to Reproduce the Bug

Expected Behavior

Environment Information

flash-linear-attention
flash-linear-attention copied to clipboard