Ma, Guokai
Ma, Guokai
@jeffra is there any comments for the over structure, direction, etc. for device abstraction or selection?
> @tjruwase Thanks for reminding. I'll read the PR and will raise question in comments.
The latest accelerator runtime interface pin_memory() is used to convert tensor interface: Tensor t.pin_memory() --> accel_runtime.pin_memory(t) In previous code t.pin_memory() is translated to t.pin_memory(device=accel_runtime.current_device()). However, this only works for latest...
> We merged the class definition and now we are modifying all accel_runtime and literal_device call site to use get_accelerator(). We are still testing internally before we push the change...
@tjruwase if we want to add a workflow for xpu device, which located outside Azure but remotely accessible, is it technically possible? Want to assess the possibility of gating CUDA...
@tjruwase can I get approval from maintainer to run workflows for new changes?
We are working on an OpBuilder abstraction in our internal repo, allows kernels and SYCLOpBuilder (or any accelerator builders) be put in seperate extension package. Will put to this PR...
OpBuilder abstraction has been added to this PR, we will update description to explain the mechanism.
@inkcherry Is there a link to the demo code? I'm interested in the potential use case of this feature proposal.
This PR should be addressing this discussion. Link. https://github.com/microsoft/DeepSpeed/discussions/4930