DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[REQUEST] torch.compile + DeepSpeed

Open sssiva81 opened this issue 1 year ago • 2 comments

I am looking into running DeepSpeed with torch.compile and facing multiple issues with respect to tracing the hooks.

DeepSpeed Stage 2 backward hook tracing with Compiled Autograd

  1. Accessing param.grad directly fails while tracing the model with AOTAutograd as the param.grad is not filled while tracing. This is not a recommended way of accessing the grad field with compiled autograd.
  2. Multiple parts of the implementation themselves lead to graph breaks like calling id(param)

With DeepSpeed Stage3 torch.compile itself fails while tracing the forward hook. Similar issue is present with tracing model parallelism.

There is an effort from Pytorch to make FSDP traceable. Is it possible to share if there is any effort to enable DeepSpeed with torch.compile or the list of features which are supported currently with torch.compile?

sssiva81 avatar Nov 14 '23 07:11 sssiva81

Hope to support as soon as possible. It is very useful for LLM.

BobLiu20 avatar Nov 16 '23 08:11 BobLiu20

Hi @sssiva81, @BobLiu20,

I submitted a draft PR #4878 to enable torch.compile. Please feel free to try. You can also check an example on Megatron-DeepSeed.

tohtana avatar Dec 28 '23 02:12 tohtana

Thanks @tohtana . Is there any similar effort going on to enable Pipeline Parallelism with torch compile? Currently it fails if we torch.compile the pipeline module because of this assertion assert isinstance(model, deepspeed.PipelineEngine)

sssiva81 avatar Jan 04 '24 16:01 sssiva81

Hi @sssiva81, the assertion was fixed by #5197. Sorry for the delay.

tohtana avatar Feb 27 '24 17:02 tohtana