Simran Arora
Simran Arora
Hi, Is there any resolution to this question for the initialization and recommended training configs to reproduce the paper results? I am also seeing some instability with the default configs....
Thanks so much! I had used layer norm and did not set the bias=False. Will try switching these. Adding the explicit deepnorm initialization also improved stability for my downstream runs,...
Here you go! - Blog: [https://hazyresearch.stanford.edu/blog/2025-11-09-hk](https://t.co/y5bCIHV1Xq) - Blog: [https://hazyresearch.stanford.edu/blog/2025-11-09-amd-brr](https://t.co/qDjnQ4uhxK) - Paper: [https://hazyresearch.stanford.edu/static/posts/2025-11-09-hk/hipkittens.pdf](https://t.co/0iU9vwNDjc) - Code: [https://github.com/HazyResearch/HipKittens](https://t.co/qKNB4CWU8H)
Not yet, but we'd love contributions if you want to add it in!
Hi can you please provide a line number, I'm not fully sure what you are referring to
Hi what is the error? The implementation configs are provided in train/configs/experiments/reference/
Also how do I run all the kernels here: https://github.com/modular/modular/tree/main/max/kernels/src/linalg/matmul/gpu/amd
Also I don't understand how to run the instructions for kbench as someone suggested on my old ticket: The first line of the README results in: ```bash benchmarks/autotune# br //:install...
Hi! Sorry for the slow response! Has the demo script that we provided been working for you for the tk kernel? My suspicion is that the padding is not being...
We do not have an A100 kernel at this time You could try using the fast transformers kernel in the repo, or something from Flash Linear Attention to speed up...