1096125073
1096125073
I’m experiencing the same problems
Hi, Telechat2 is a language model independently developed by China Telecom AI Company. In order to facilitate users to use the superior awq algorithm, this PR was raised.
https://huggingface.co/Tele-AI/TeleChat2-7B-32K
Is there any way to ensure that the engine generated by build is identical?This is important for engineering deployment.
i have disable custom_all_reduce when build engine
> Hi @1096125073 , since different batch sizes may lead to different kernels. So, the results can be different. This is a known issue. Thank you for your answer! I'm...
> @1096125073 Yes, I get your point: repeat the same input prompt 4 times, and make it a batch, but the outputs are different from batch size 1. Unfortunately, it's...
> @1096125073 Do you use multiple GPUs? If you use multi-GPU, you can use NCCL_ALGO=Tree to ensure stable reduce order. NCCL usually select Ring algo, which has unstable reduce order,...