Ma, Guokai comments

Results 180 comments of


                                            Ma, Guokai

[RFC] add device abstraction to allow other device than CUDA be used

@jeffra is there any comments for the over structure, direction, etc. for device abstraction or selection?

[RFC] add device abstraction to allow other device than CUDA be used

> @tjruwase Thanks for reminding. I'll read the PR and will raise question in comments.

[RFC] add device abstraction to allow other device than CUDA be used

The latest accelerator runtime interface pin_memory() is used to convert tensor interface: Tensor t.pin_memory() --> accel_runtime.pin_memory(t) In previous code t.pin_memory() is translated to t.pin_memory(device=accel_runtime.current_device()). However, this only works for latest...

[RFC] add device abstraction to allow other device than CUDA be used

> We merged the class definition and now we are modifying all accel_runtime and literal_device call site to use get_accelerator(). We are still testing internally before we push the change...

[RFC] add device abstraction to allow other device than CUDA be used

@tjruwase if we want to add a workflow for xpu device, which located outside Azure but remotely accessible, is it technically possible? Want to assess the possibility of gating CUDA...

[RFC] add device abstraction to allow other device than CUDA be used

@tjruwase can I get approval from maintainer to run workflows for new changes?

[RFC] add device abstraction to allow other device than CUDA be used

We are working on an OpBuilder abstraction in our internal repo, allows kernels and SYCLOpBuilder (or any accelerator builders) be put in seperate extension package. Will put to this PR...

[RFC] add device abstraction to allow other device than CUDA be used

OpBuilder abstraction has been added to this PR, we will update description to explain the mechanism.

[Draft][Demo] auto tp training

@inkcherry Is there a link to the demo code? I'm interested in the potential use case of this feature proposal.

[Draft][Demo] auto tp training

This PR should be addressing this discussion. Link. https://github.com/microsoft/DeepSpeed/discussions/4930