Simran Arora comments

Results 10 comments of


                                            Simran Arora

retnet traning config

Hi, Is there any resolution to this question for the initialization and recommended training configs to reproduce the paper results? I am also seeing some instability with the default configs....

retnet traning config

Thanks so much! I had used layer norm and did not set the bias=False. Will try switching these. Adding the explicit deepnorm initialization also improved stability for my downstream runs,...

When will ThunderKittens support AMD GPUs, specifically the W7900?

Here you go! - Blog: [https://hazyresearch.stanford.edu/blog/2025-11-09-hk](https://t.co/y5bCIHV1Xq) - Blog: [https://hazyresearch.stanford.edu/blog/2025-11-09-amd-brr](https://t.co/qDjnQ4uhxK) - Paper: [https://hazyresearch.stanford.edu/static/posts/2025-11-09-hk/hipkittens.pdf](https://t.co/0iU9vwNDjc) - Code: [https://github.com/HazyResearch/HipKittens](https://t.co/qKNB4CWU8H)

Does thunderkittens supports integer operations?

Not yet, but we'd love contributions if you want to add it in!

bug in evalute? why there not guard when using majority voting (MV)? Isn't it only needed for ws?

Hi can you please provide a line number, I'm not fully sure what you are referring to

Swiglu Issue

Hi what is the error? The implementation configs are provided in train/configs/experiments/reference/

[BUG]: AMD GEMM

Also how do I run all the kernels here: https://github.com/modular/modular/tree/main/max/kernels/src/linalg/matmul/gpu/amd

[BUG]: AMD GEMM

Also I don't understand how to run the instructions for kbench as someone suggested on my old ticket: The first line of the README results in: ```bash benchmarks/autotune# br //:install...

Using TK kernels results in a bad model

Hi! Sorry for the slow response! Has the demo script that we provided been working for you for the tk kernel? My suspicion is that the padding is not being...

ThunderKittens Hedgehog doesn't support A100 (cannot use lolcats_llama_window_tk_gen)

We do not have an A100 kernel at this time You could try using the fast transformers kernel in the repo, or something from Flash Linear Attention to speed up...