Enrico Shippole comments

Results 155 comments of


                                            Enrico Shippole

trafficstars

A few questions on training

> Hi Enrico, Thanks for your response, on the note of flash attention not being possible on tpus, does this imply that tpu context size/efficiency will be substantially behind gpus...

Weights are downloading fine for me as well. Set the pre-signed URL in the bash script: ```python PRESIGNED_URL=""" # Set your URL here declare -A N_SHARD_DICT N_SHARD_DICT["7B"]="0" N_SHARD_DICT["13B"]="1" N_SHARD_DICT["30B"]="3" N_SHARD_DICT["65B"]="7"...

Bad dataset

I am working on putting together a FLAN dataset as well to upload to the HF hub. Training a 7B and 13B llama model on OIG at bf16 no LORA....

Bad dataset

Uploaded these so far for Flan: https://huggingface.co/datasets/conceptofmind/flan_niv2_zsopt https://huggingface.co/datasets/conceptofmind/flan_cot_fsopt https://huggingface.co/datasets/conceptofmind/flan_cot_zsopt https://huggingface.co/datasets/conceptofmind/flan_cot_submix

Multi-GPU OOM when resuming from checkpoint

> Can you try installing via main? Aka: `pip install git+https://github.com/huggingface/accelerate`? > > And ideally can you tell us the output of `accelerate env`? I will try with main right...

Multi-GPU OOM when resuming from checkpoint

> Can you try installing via main? Aka: `pip install git+https://github.com/huggingface/accelerate`? > > And ideally can you tell us the output of `accelerate env`? Looks like this is all set...

FSDP unable to load checkpoint, state dict, saved weights

Hi @pacman100 , Thank you for the response. I will test out loading and saving the models with FSDP. Best, Enrico

PyTorch FSDP with Accelerate

> Yes, if the model is already wrapped in `FullyShardedDataParallel `, `accelerator.prepare` will just return the same Thank you for the immediate response. I currently have the accelerate config as:...

PyTorch FSDP with Accelerate

Additionally, I am having an issue when trying to resume from a model checkpoint with `use_orig_params=True`: ```python if CFG.RESUME_FROM_CHECKPOINT is not None or CFG.RESUME_FROM_CHECKPOINT != "": accelerator.print(f"Resuming from checkpoint {CFG.RESUME_FROM_CHECKPOINT}")...

PyTorch FSDP with Accelerate

This has not been resolved and the PR addressing it was automatically closed.