German Abramov issues

Results 21 issues of


                                            German Abramov

Multi Node different rank checkpoints

For example, if I'm training on 2 nodes, should I have checkpoints both 0 and 1 rank? I have `save_filename: ep{epoch}-ba{batch}-rank{rank}.pt` But checkpoints saving only for node 0 with rank...

The given group does not exist pytorch

Do you know why i got this problem with `pretrain_gpt_single_node.sh`? I'm setting `N_GPUS=1` and got ``` File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 191, in _get_group_rank raise RuntimeError("The given group does not exist") RuntimeError:...

can't open .sh file

Hey! I'm would to be training some resnet models using this imagenet dataset So i've git cloned imagenetloader.torch to my PC (os: windows10) But when i'm launch **valprep.sh** file it...

Train with attention mask

Hi, Llama 3 trains like this > We trained the models on sequences of 8,192 tokens, using a mask to ensure self-attention does not cross document boundaries. I see you...

enhancement

ValueError: cannot reshape array of size 24 into shape (8,newaxis,8) in Dataloader

Hi, looks like some new version of llm-foundry (updated from master) have lags in last week-two. I have error like this ``` train_loader: dataset: max_seq_len: 2048 shuffle: true shuffle_seed: 17...

merge_index doesn't show path with subfolders

Hi! I'm trying to merge index.jsons into one, so I have folder ``` dataset/ part.00000/ train/ index.json shard.00000.mds … val/ index.json shard.00000.mds … part.00001/ train/ index.json shard.00000.mds … val/ index.json...

0.4.0 lm-evaluation-harness

Hi! Your benchmarks are functioning well with version 0.3.0 of lm-evaluation-harness. Are there any plans to update and support version 0.4.0?

good first issue

LLaMA PRO training resume problem

Hello, I'm currently training LLaMA PRO. Initially, I expanded the model from 32 layers to 40 layers and proceeded to train only the newly added 8 layers (every fifth layer)....

question

Fill in the middle

Hi! Do you support fill in the middle technique in pretrain pipelines? If yes, do you have some documentation about this? Thanks!

enhancement

memory bound and global_batch_size

Hello, I’m running a 7B model with a 32k context size and seeing unexpected memory scaling behaviors. Here’s the situation: - **Config**: same overall setup, only changing `global_batch_size`. - **Case...