Ethan He comments

Repositories
Issues
Comments

Results 36 comments of


Ethan He

[QUESTION] Does Megatron-Core supports LLAMA models?

You need to use mcore models. local is deprecating

[QUESTION] Does Megatron-Core supports LLAMA models?

It's handled by TEnorm

Speed up the creation of attention mask

generally, mask will be created inside transformer engine if `--use-mcore-models`

[QUESTION] How to pre-build the dataset's index ?

you can use --data-cache-path to specify where you want to cache. And precompute it using a single node. https://github.com/NVIDIA/Megatron-LM/blob/9de386d08770d7296263a590171ace4ae45348ad/megatron/training/arguments.py#L1349-L1350

why stop gradient?

tbh, I don't exactly remember the details. You can try remove stopgrad and compare the peformance

[QUESTION] Hello, a consumed samples means how many token in the training? And json file convert to .bin and .idx file

(1) tokens = seq_len * consumed samples