DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

add bf16 cuda kernel support

Open dc3671 opened this issue 1 year ago • 14 comments

  • This PR will add bf16 support for part of cuda kernels used by BLOOM with simple C++ template implementation.

  • It's related to https://github.com/microsoft/DeepSpeed/pull/3041, and will benefit comparison between cuda and cpu.

  • Also, some existed fp16 implementation is combined with fp32 version if possible.

Please help review @tjruwase @jeffra thanks~

cc @delock @rogerxfeng8

dc3671 avatar Mar 24 '23 09:03 dc3671

@microsoft-github-policy-service agree company="intel"

dc3671 avatar Mar 24 '23 09:03 dc3671

@jeffra Could you plz help enabling CI workflow? I added some fixes to bypass bf16 if cuda arch does not support it.

dc3671 avatar Mar 28 '23 06:03 dc3671

fixed a typo that caused test failure. plz relaunch test, thx~ @jeffra @tjruwase

dc3671 avatar Mar 30 '23 08:03 dc3671

@jeffra @tjruwase all tests are passed. May you have a look at the modification? Thanks~

dc3671 avatar Mar 31 '23 02:03 dc3671

@cmikeh2 The conflict is fixed. Could you relaunch the workflow? Thanks.

dc3671 avatar Apr 07 '23 04:04 dc3671

@cmikeh2 Sry, I just fixed some problems of template declaration in softmax.cu. Please relaunch workflow, thanks~

dc3671 avatar Apr 07 '23 07:04 dc3671

@cmikeh2 @tjruwase @jeffra please relaunch workflow, thanks~

dc3671 avatar Apr 11 '23 02:04 dc3671

@cmikeh2 @tjruwase @jeffra The conflict is resolved. Hope it can be merged successfully this time🤣

dc3671 avatar Apr 12 '23 02:04 dc3671

@cmikeh2 @tjruwase @jeffra the amd test workflow(https://github.com/microsoft/DeepSpeed/actions/runs/4673912327) failed, but I don't it's related to my modification... And again I rebased master branch, could you please relaunch the workflow? Thanks~

dc3671 avatar Apr 12 '23 04:04 dc3671

@cmikeh2 the nv-mii test and amd test failed again. Do you think it's related to my modification? Or just need to retry?

dc3671 avatar Apr 12 '23 05:04 dc3671

@cmikeh2 the nv-mii test and amd test failed again. Do you think it's related to my modification? Or just need to retry?

I think it’s likely unrelated. We sometimes see node issues with the MII tests and the AMD tests seem to be failing elsewhere as well. I’ll investigate this internally and see what we can do to get this merged in.

cmikeh2 avatar Apr 12 '23 06:04 cmikeh2

@cmikeh2 @tjruwase @jeffra gentle ping 🤷‍♂️

dc3671 avatar Apr 14 '23 02:04 dc3671

@cmikeh2 @tjruwase @jeffra gentle ping

dc3671 avatar Apr 17 '23 02:04 dc3671

@cmikeh2 @tjruwase @jeffra A fix of cupy installation(https://github.com/microsoft/DeepSpeed/pull/3276) is synced from master. Please relaunch workflow. Thanks

dc3671 avatar Apr 19 '23 03:04 dc3671

@cmikeh2 @tjruwase @jeffra hi, is amd-mi200 test unable to launch? I see many actions are pending on this workflow.

dc3671 avatar Apr 22 '23 14:04 dc3671

@dc3671, yes, we are not having much luck with this workflow.

tjruwase avatar Apr 22 '23 14:04 tjruwase