Onur Galoglu

Results 5 comments of Onur Galoglu

@thanhlt998 Just to share my own experience: When I use `float32` and disable `remove_input_padding`, I see almost no discrepancy between HF outputs and TRT-LLM outputs with Python runtime (`max_batch_size=32`).

@thanhlt998 I see, I got your point! Just want to add that I can actually use `bfloat16` in FasterTransformer without having this discrepancy in the outputs.

@symphonylyh thanks a lot; I can confirm that, for [this input_text](https://gist.github.com/ogaloglu/1b764913b7e24045a58c829b2216a115), there is no difference in the results depending on the `remove_input_padding` parameter (`bfloat16` and Python runtime are used)! I...

@symphonylyh thank you! Looking forward to your updates!

@symphonylyh thank you for the insight! Then, I will run some experiments early next week and share the outcomes.