Masaki Kozuki

NVIDIA Tokyo

Results 167 comments of


                                            Masaki Kozuki

Move Grad AllReduce Bucketing to inside of `thunder.executors.passes.transform_for_execution` from `torch_autograd.split_forward_backward`

I don't have any out of my head

Move Grad AllReduce Bucketing to inside of `thunder.executors.passes.transform_for_execution` from `torch_autograd.split_forward_backward`

This is for ddp. We could reuse this for fsdp backward as well but the semantic would be different from what we have today

Timeout for Platypus-30B and Thunder compile

setting: 20240729 nightly image & 8 A100-SXM4-80GB devices, ### Platypus-30B If I tweak the number of layers to 36, it works. W/o the tweak, it fails due to out of...

[benchmark] migrate to fsdp/ddp after jit, from fsdp/ddp before jit

I took two memory snapshots of both to see if the memory increase comes from the training step. main ![image](https://github.com/user-attachments/assets/686bdae4-40ca-4332-90b3-6b394e5e328f) pr ![image](https://github.com/user-attachments/assets/6e7fc815-9791-4483-bd46-c4764a804000) It seems that the difference comes from outside...

[benchmark] migrate to fsdp/ddp after jit, from fsdp/ddp before jit

> Just so I understand the snapshot above, the blue markers are memory allocation during the training step right? Do we know the reason why `fsdp(jit(model))` has higher consumption? Is...

Models trained with FSDP + Thunder doesn't work with litgpt chat

https://github.com/Lightning-AI/lightning-thunder/issues/564 could be related

Make return statement depend on the outputs of `prim.copy_`s

@jjsjann123 would you have any idea about the comment of https://github.com/Lightning-AI/lightning-thunder/pull/936#issuecomment-2274967390?

Desired inplace idioms (add yours here)

`thunder.jit` the following function with nvfuerex fails with the message below. By moving the copy for `a += b` to the end of a trace and replacing `a += b`...

Desired inplace idioms (add yours here)

multiple in-place whose operand is the func's arg is not appropriately handled. ```python import torch import thunder def f(a): return a.exp_().sin_() if __name__ == "__main__": x = torch.randn(4, device="cuda", requires_grad=False)...

Desired inplace idioms (add yours here)

> Is there anything wrong with that? No, but it would look better if variable names were like params_0, grads_1, exp_avgs_2

‹
1
2
...
7
8
9
10
11
12
13
...
16
17
›