Linsong Chu
Linsong Chu
We have been noticing a slowdown on training that was introduced by our dataloader. Upon further checking, we identified the issue coming from the fact that our dataset class is...
## Scope This write-up only applies to "initial model init". For cases that require loading a checkpoint (continue-pretraining, fine-tuning and inference), this is not needed as any init would be...
We recently added a commit to raise Dynamo accumulated cache size limit to make compile work with large models like 70b whose num_layer is greater than default limit (64): https://github.com/foundation-model-stack/fms-fsdp/pull/45#issuecomment-2002564455....
add a flop counter to the code with a bool flag. it is already available in the flop_counter branch but will require some extra work to prettify it and integrate...
This happened once before and got fixed: https://github.com/EleutherAI/lm-evaluation-harness/issues/898 But now it seems not working again with same error, at least on my end. ```bash File "/home/lchu/.conda/envs/main/lib/python3.9/site-packages/datasets/builder.py", line 1726, in _prepare_split_single...