Chien-Chin Huang comments

Results 119 comments of


                                            Chien-Chin Huang

Can provide some tutorials or scripts for using Flight Recorder？

@svekars Can we close the issue since we already provided the document?

Context parallel on Turing GPUs?

Current CP only supports SDPA. This error is from SDPA indicating that it cannot find the available kernels. We only support memory efficient, flash, and cudnn attention.

[CP][BE][3/N] Add _templated_ring_attention to the backward compatility stub

@pytorchbot merge

Lc

@lcvcl This PR looks like being submitted accidentally. Please let us know if that's the case. I'll close the PR later if there are no further action items.

Hpc setup

@githubsgi You can check https://github.com/pytorch/torchtitan/blob/main/docs/extension.md#extending-jobconfig. This should meet your goal instead of adding new ones to the main JobConfig.

How are the TP, CP, and PP marked in PyTorch profiler traces ?

There were no profiler labelled for some of the parallelisms, at this moment. We can go through these parallelisms to understand if it is reasonable or how clear it is...

dcp.load fails on checkpoints prior to AdamW refactor

ye, we probably have to workaround the BC issue as it is caused by AdamW change.

fully_shard() for huggingface model: pytorch caches too much GPU memory

@mingdianliu could it be possible that the activations dominate the memory usage under such a setting? Like a 7B model, even if we use float32, then the parameters + gradients...

fully_shard() for huggingface model: pytorch caches too much GPU memory

Caching is not an issue because those memory will be reused for other tensor allocation. But this will not cause OOM because when new tensors are created, PyTorch will first...

[Compiler Toolkit] Assert compile.enable=False

Echo to @xmfan's comment. The compile flag must be set to false to enable the compiler toolkit, which seems counterintuitive.