Chien-Chin Huang
Chien-Chin Huang
@svekars Can we close the issue since we already provided the document?
Current CP only supports SDPA. This error is from SDPA indicating that it cannot find the available kernels. We only support memory efficient, flash, and cudnn attention.
@lcvcl This PR looks like being submitted accidentally. Please let us know if that's the case. I'll close the PR later if there are no further action items.
@githubsgi You can check https://github.com/pytorch/torchtitan/blob/main/docs/extension.md#extending-jobconfig. This should meet your goal instead of adding new ones to the main JobConfig.
There were no profiler labelled for some of the parallelisms, at this moment. We can go through these parallelisms to understand if it is reasonable or how clear it is...
ye, we probably have to workaround the BC issue as it is caused by AdamW change.
@mingdianliu could it be possible that the activations dominate the memory usage under such a setting? Like a 7B model, even if we use float32, then the parameters + gradients...
Caching is not an issue because those memory will be reused for other tensor allocation. But this will not cause OOM because when new tensors are created, PyTorch will first...
Echo to @xmfan's comment. The compile flag must be set to false to enable the compiler toolkit, which seems counterintuitive.