tianyu-l comments

Results 204 comments of


                                            tianyu-l

run sdpa with dtensor

> just curious, is this gonna land soon or does it have some risk or unfinished business? > > also looks like this could use a rebase. i got a...

run sdpa with dtensor

> oh, is this related to dispatching for complex numbers by any chance? @wconstab Possibly, we don't know. The `aten.mul` op returns bad results with inputs being raw torch.Tensor (desugared...

exclude embedding in MFU computation

@ad8e I checked some other mainstream repos on how MFU is computed. From what I can tell, most (if not all) of them are using 12. For example: - nanoGPT:...

lr scheduler - update global states into optimizer

Seems not. @awgu https://github.com/pytorch/torchtitan/blob/main/torchtitan/lr_scheduling.py#L10 > cc: @tianyu-l is this issue done?

Verify that we can do eval / inference

got similar feature request from offline: generation examples, plus some evals integration

RFC for ckpt apis

Agree that these are all nice to have, especially 1 and 2 I've also thought about! One more concern inspired by the mast experience: mast will repeatedly try launching the...

Loss curve spikes on amalagamated datasets - need full scale shuffler in dataloader

> ```python > hf_ds = HuggingFaceDataset( > dataset_name, dataset_path, tokenizer, seq_len, world_size, rank, infinite > ) > if shuffle: > hf_ds._data = hf_ds._data.shuffle(seed=int(rank*10007+int(time.time()))) > ``` @XinDongol For map-style dataset, this...

tianyu-l

run sdpa with dtensor

run sdpa with dtensor

exclude embedding in MFU computation

lr scheduler - update global states into optimizer

Verify that we can do eval / inference

RFC for ckpt apis

Loss curve spikes on amalagamated datasets - need full scale shuffler in dataloader

Loss curve spikes on amalagamated datasets - need full scale shuffler in dataloader

Grad scaler not in train state

Add tests to test each component