DeepSpeed
DeepSpeed copied to clipboard
add bf16 cuda kernel support
-
This PR will add bf16 support for part of cuda kernels used by BLOOM with simple C++ template implementation.
-
It's related to https://github.com/microsoft/DeepSpeed/pull/3041, and will benefit comparison between cuda and cpu.
-
Also, some existed fp16 implementation is combined with fp32 version if possible.
Please help review @tjruwase @jeffra thanks~
cc @delock @rogerxfeng8
@microsoft-github-policy-service agree company="intel"
@jeffra Could you plz help enabling CI workflow? I added some fixes to bypass bf16 if cuda arch does not support it.
fixed a typo that caused test failure. plz relaunch test, thx~ @jeffra @tjruwase
@jeffra @tjruwase all tests are passed. May you have a look at the modification? Thanks~
@cmikeh2 The conflict is fixed. Could you relaunch the workflow? Thanks.
@cmikeh2 Sry, I just fixed some problems of template declaration in softmax.cu. Please relaunch workflow, thanks~
@cmikeh2 @tjruwase @jeffra please relaunch workflow, thanks~
@cmikeh2 @tjruwase @jeffra The conflict is resolved. Hope it can be merged successfully this time🤣
@cmikeh2 @tjruwase @jeffra the amd test workflow(https://github.com/microsoft/DeepSpeed/actions/runs/4673912327) failed, but I don't it's related to my modification... And again I rebased master branch, could you please relaunch the workflow? Thanks~
@cmikeh2 the nv-mii test and amd test failed again. Do you think it's related to my modification? Or just need to retry?
@cmikeh2 the nv-mii test and amd test failed again. Do you think it's related to my modification? Or just need to retry?
I think it’s likely unrelated. We sometimes see node issues with the MII tests and the AMD tests seem to be failing elsewhere as well. I’ll investigate this internally and see what we can do to get this merged in.
@cmikeh2 @tjruwase @jeffra gentle ping 🤷♂️
@cmikeh2 @tjruwase @jeffra gentle ping
@cmikeh2 @tjruwase @jeffra A fix of cupy installation(https://github.com/microsoft/DeepSpeed/pull/3276) is synced from master. Please relaunch workflow. Thanks
@cmikeh2 @tjruwase @jeffra hi, is amd-mi200 test unable to launch? I see many actions are pending on this workflow.
@dc3671, yes, we are not having much luck with this workflow.