flash-linear-attention
flash-linear-attention copied to clipboard
[Bug] Throughput benchmarking script fails
Checklist
- [x] I have checked FAQs and existing issues for similar problems
- [x] My GPU is H100 and I have installed
triton-nightlybuilt by fla team, and double checked FAQs - [x] Please report this bug in English to ensure wider understanding and support
Describe the Bug
^^^^^^^^^^^^^^^^^^^^^
File "/data/cl/user/yangsl66/miniconda3/envs/fla/lib/python3.12/site-packages/fla/models/transformer/modeling_transformer.py", line 374, in forward logits = None if fuse_linear_and_cross_entropy else self.lm_head(hidden_states[:, -logits_to_keep:]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Boolean value of Tensor with more than one value is ambiguous
Steps to Reproduce the Bug
N/A
Expected Behavior
N/A
Environment Information
- Torch:
- Triton:
Maybe related to https://github.com/fla-org/flash-linear-attention/pull/401
This issue is stale because it has been open for 30 days with no activity.