Yu Zhang comments

Results 89 comments of


                                            Yu Zhang

Reasons for upcasting the logits dtype outside the kernel

I've also found that for 128K vocab, 8 chunks can be faster, with the cost of nearly

Reasons for upcasting the logits dtype outside the kernel

**UPDATE:** Just trained 3 370M models on 10B tokens of Fineweb-edu with 8K ctx length and 32K vocab Below are the results. Superisingly, 8 chunks exhibits the best ppl. V/H=32K/1K=32...

Make `BufferShuffledExamplesIterable` resumable

> The time it takes to resume depends on the expected maximum distance in this case right ? Do you know its relationship with $B$ Hi, I created a histogram...

Make `BufferShuffledExamplesIterable` resumable

Maybe there's a middle ground between rebuilding the buffer from scratch and storing the entire buffer, but the logic is a bit complicated and takes time to implement. At least...

Make `BufferShuffledExamplesIterable` resumable

@lhoestq I'm not sure if it's ok to use progress bar when having multiple workers. How about passing an arg `resumable=True` to `IterableDataset.shuffle` to allow for controling of the behaviors?

Make `BufferShuffledExamplesIterable` resumable

@lhoestq > Loading from disk is a good option for this (although it's not always possible to serialize the content of the buffer, in that case the buffer would restart...

Make `BufferShuffledExamplesIterable` resumable

@lhoestq > Are you ok with adding buffer_resuming_mode= to .shuffle() to enable buffer recovering using your method with buffer_resuming_mode="recover_from_source" ? (feel free to suggest other names for the parameter and...

Mitigation to HuggingFace Trainer

Hi, @huyiwen I think the smoothest way for migration is to define hf style models and use torchtitan for training with 4d parallel. You may also be interested in https://github.com/fla-org/flame.git,...

Mitigation to HuggingFace Trainer

Hi, @kwen2501 > Dumb q: would HF-style model definition enable composability with HF Trainer? Did HF document the style requirement somehwere? HF-style models (e.g., `AutoModelForCausalLM`) also inherit from `nn.Module`, making...

[Feature Request] Canon Layer

@tesla3 Hi, checkout this PR, I'm working on it.