tianyu-l

Results 33 comments of tianyu-l

> just curious, is this gonna land soon or does it have some risk or unfinished business? > > also looks like this could use a rebase. i got a...

> oh, is this related to dispatching for complex numbers by any chance? @wconstab Possibly, we don't know. The `aten.mul` op returns bad results with inputs being raw torch.Tensor (desugared...

@ad8e I checked some other mainstream repos on how MFU is computed. From what I can tell, most (if not all) of them are using 12. For example: - nanoGPT:...

Seems not. @awgu https://github.com/pytorch/torchtitan/blob/main/torchtitan/lr_scheduling.py#L10 > cc: @tianyu-l is this issue done?

got similar feature request from offline: generation examples, plus some evals integration

Agree that these are all nice to have, especially 1 and 2 I've also thought about! One more concern inspired by the mast experience: mast will repeatedly try launching the...

> ```python > hf_ds = HuggingFaceDataset( > dataset_name, dataset_path, tokenizer, seq_len, world_size, rank, infinite > ) > if shuffle: > hf_ds._data = hf_ds._data.shuffle(seed=int(rank*10007+int(time.time()))) > ``` @XinDongol For map-style dataset, this...

> I was wondering, any idea to not use `.skip()` when resuming training? In my setup (& colab), skipping 10000000 samples took 90s approximately. @TJ-Solergibert 1. We should use `.skip()`...

close as we don't plan to support fp16, and have removed grad scaler in code

TODO: test correctness of checkpointable data loading in #279