TransnormerLLM issues

Bugs in Triton operator?

1

Hi. Thanks for the nice triton implementation. Maybe I found a bug in the triton operator. It seems that the operator does not support head dim=192, but it supports dim=128...

XintianHan

Benchmark results can not be reproduced

1

I tested transnormerllm-385m with llm-eval-harness for boolq benchmark. However, the result is not aligned to that result you have reported. As well as boolq benchmark, and 385m model, other benchmarks...

waneon

When will some of the Leaderboard evaluation results be made available?

1

as title

RanchiZhao

会放出13b左右的预训练么

1

你好，看到这个项目很是激动！可惜150b实在太大了，是否会计划有13b左右的版本放出呢（比如用于代码等任务）

CrazyBoyM

The publication

1

Thanks for your great work. When will the code be released?

wangyuxin87

Differences between Lightning Attention1 and Lightning Attention2 code implementations

6

hello, I have two questions I’d like to ask: 1. In this repository, I noticed that the implementations of lightning attention1 and lightning attention2 appear identical 2. The implementation of...

Hanshifancoder

TransnormerLLM
TransnormerLLM copied to clipboard

Metadata

Bugs in Triton operator?

Benchmark results can not be reproduced

When will some of the Leaderboard evaluation results be made available?

会放出13b左右的预训练么

The publication

Differences between Lightning Attention1 and Lightning Attention2 code implementations

← Metadata

Owner

Metadata

TransnormerLLM TransnormerLLM copied to clipboard

Metadata

Bugs in Triton operator?

Benchmark results can not be reproduced

When will some of the Leaderboard evaluation results be made available?

会放出13b左右的预训练么

The publication

Differences between Lightning Attention1 and Lightning Attention2 code implementations

← Metadata

Owner

Metadata

TransnormerLLM
TransnormerLLM copied to clipboard