Megatron-LM
Megatron-LM copied to clipboard
[QUESTION] How to pre-build the dataset's index ?
How to pre-build the dataset's index ?
I want to avoid using compute node for this task:
> WARNING: could not find index map files, building the indices on rank 0 ...
> elasped time to build and save doc-idx mapping (seconds): 270.614145
you can use --data-cache-path to specify where you want to cache. And precompute it using a single node.
https://github.com/NVIDIA/Megatron-LM/blob/9de386d08770d7296263a590171ace4ae45348ad/megatron/training/arguments.py#L1349-L1350
Marking as stale. No activity in 60 days.