DeepSpeed [REQUEST] torch.compile + DeepSpeed

[REQUEST] torch.compile + DeepSpeed

Open sssiva81 opened this issue 1 year ago • 2 comments

I am looking into running DeepSpeed with torch.compile and facing multiple issues with respect to tracing the hooks.

DeepSpeed Stage 2 backward hook tracing with Compiled Autograd

Accessing param.grad directly fails while tracing the model with AOTAutograd as the param.grad is not filled while tracing. This is not a recommended way of accessing the grad field with compiled autograd.
Multiple parts of the implementation themselves lead to graph breaks like calling id(param)

With DeepSpeed Stage3 torch.compile itself fails while tracing the forward hook. Similar issue is present with tracing model parallelism.

There is an effort from Pytorch to make FSDP traceable. Is it possible to share if there is any effort to enable DeepSpeed with torch.compile or the list of features which are supported currently with torch.compile?

Nov 14 '23 07:11 sssiva81

Hope to support as soon as possible. It is very useful for LLM.

Nov 16 '23 08:11 BobLiu20

Hi @sssiva81, @BobLiu20,

I submitted a draft PR #4878 to enable torch.compile. Please feel free to try. You can also check an example on Megatron-DeepSeed.

Dec 28 '23 02:12 tohtana

Thanks @tohtana . Is there any similar effort going on to enable Pipeline Parallelism with torch compile? Currently it fails if we torch.compile the pipeline module because of this assertion assert isinstance(model, deepspeed.PipelineEngine)

Jan 04 '24 16:01 sssiva81

Hi @sssiva81, the assertion was fixed by #5197. Sorry for the delay.

Feb 27 '24 17:02 tohtana

DeepSpeed DeepSpeed copied to clipboard

[REQUEST] torch.compile + DeepSpeed

DeepSpeed
DeepSpeed copied to clipboard