Fabian Joswig

Results 13 comments of Fabian Joswig

I resolved the merge conflicts and fixed a few smaller issues but now one of the tests `test_rwms` is failing. Can you have a look @jkuhl-uni ?

Thanks a lot @jkuhl-uni ! As we have accumulated quite a few changes I would suggest that we all do a thorough review and then merge this first version with...

We just stumbled upon this issue and compared the implementation of the RMSNorm between TransformerEngine and [TensorRT-LLM](https://github.com/NVIDIA/TensorRT-LLM/blob/b8fc6633ba27550dea5f33641f5d94b6c7f02125/tensorrt_llm/functional.py#L5157). It looks like TensorRT-LLM does the weight multiplication in lower precision, consistent with...