German Abramov

Results 21 issues of German Abramov

For example, if I'm training on 2 nodes, should I have checkpoints both 0 and 1 rank? I have `save_filename: ep{epoch}-ba{batch}-rank{rank}.pt` But checkpoints saving only for node 0 with rank...

Do you know why i got this problem with `pretrain_gpt_single_node.sh`? I'm setting `N_GPUS=1` and got ``` File "/opt/conda/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 191, in _get_group_rank raise RuntimeError("The given group does not exist") RuntimeError:...

Hey! I'm would to be training some resnet models using this imagenet dataset So i've git cloned imagenetloader.torch to my PC (os: windows10) But when i'm launch **valprep.sh** file it...

Hi, Llama 3 trains like this > We trained the models on sequences of 8,192 tokens, using a mask to ensure self-attention does not cross document boundaries. I see you...

enhancement

Hi, looks like some new version of llm-foundry (updated from master) have lags in last week-two. I have error like this ``` train_loader: dataset: max_seq_len: 2048 shuffle: true shuffle_seed: 17...

Hi! I'm trying to merge index.jsons into one, so I have folder ``` dataset/ part.00000/ train/ index.json shard.00000.mds … val/ index.json shard.00000.mds … part.00001/ train/ index.json shard.00000.mds … val/ index.json...

Hi! Your benchmarks are functioning well with version 0.3.0 of lm-evaluation-harness. Are there any plans to update and support version 0.4.0?

good first issue

Hello, I'm currently training LLaMA PRO. Initially, I expanded the model from 32 layers to 40 layers and proceeded to train only the newly added 8 layers (every fifth layer)....

question

Hi! Do you support fill in the middle technique in pretrain pipelines? If yes, do you have some documentation about this? Thanks!

enhancement

Hello, I’m running a 7B model with a 32k context size and seeing unexpected memory scaling behaviors. Here’s the situation: - **Config**: same overall setup, only changing `global_batch_size`. - **Case...