Megatron-LM icon indicating copy to clipboard operation
Megatron-LM copied to clipboard

[QUESTION] Hello, a consumed samples means how many token in the training? And json file convert to .bin and .idx file

Open 13416157913 opened this issue 7 months ago • 3 comments

Hello, (1)A consumed samples means how many token in the training? (2)How compute all token number after json file convert to .bin and .idx file?

13416157913 avatar Dec 02 '23 06:12 13416157913

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Jan 31 '24 18:01 github-actions[bot]

(1) tokens = seq_len * consumed samples

ethanhe42 avatar Jan 31 '24 18:01 ethanhe42

Marking as stale. No activity in 60 days.

github-actions[bot] avatar Apr 01 '24 18:04 github-actions[bot]