Haiyang Huang

Results 5 comments of Haiyang Huang

The problem seems to be rooted from the ds_qkv_gemm implementation under FP16. This kernel works fine when handling FP32 inputs. However, when running under FP16, only the inp_norm can be...

Here is a screenshot created by the same script with different precision. On the left is the results of a dense layer given FP32 and the right is the results...

Sure, here is the script I'm using. I made some modification to deepspeed/module_inject/replace_module.py to ensure the args and flags are respected by the deepspeed.init_inference() function. Besides the fp16 and kernel...

Thank you for your reply! I passed the make test by changing some configuration I was using, but I am not sure if I really solved all the problems I...