Enrico Shippole

Results 155 comments of Enrico Shippole
trafficstars

> Hi Enrico, Thanks for your response, on the note of flash attention not being possible on tpus, does this imply that tpu context size/efficiency will be substantially behind gpus...

Weights are downloading fine for me as well. Set the pre-signed URL in the bash script: ```python PRESIGNED_URL=""" # Set your URL here declare -A N_SHARD_DICT N_SHARD_DICT["7B"]="0" N_SHARD_DICT["13B"]="1" N_SHARD_DICT["30B"]="3" N_SHARD_DICT["65B"]="7"...

I am working on putting together a FLAN dataset as well to upload to the HF hub. Training a 7B and 13B llama model on OIG at bf16 no LORA....

Uploaded these so far for Flan: https://huggingface.co/datasets/conceptofmind/flan_niv2_zsopt https://huggingface.co/datasets/conceptofmind/flan_cot_fsopt https://huggingface.co/datasets/conceptofmind/flan_cot_zsopt https://huggingface.co/datasets/conceptofmind/flan_cot_submix

> Can you try installing via main? Aka: `pip install git+https://github.com/huggingface/accelerate`? > > And ideally can you tell us the output of `accelerate env`? I will try with main right...

> Can you try installing via main? Aka: `pip install git+https://github.com/huggingface/accelerate`? > > And ideally can you tell us the output of `accelerate env`? Looks like this is all set...

Hi @pacman100 , Thank you for the response. I will test out loading and saving the models with FSDP. Best, Enrico

> Yes, if the model is already wrapped in `FullyShardedDataParallel `, `accelerator.prepare` will just return the same Thank you for the immediate response. I currently have the accelerate config as:...

Additionally, I am having an issue when trying to resume from a model checkpoint with `use_orig_params=True`: ```python if CFG.RESUME_FROM_CHECKPOINT is not None or CFG.RESUME_FROM_CHECKPOINT != "": accelerator.print(f"Resuming from checkpoint {CFG.RESUME_FROM_CHECKPOINT}")...

This has not been resolved and the PR addressing it was automatically closed.