Enrico Shippole
Enrico Shippole
> Hi Enrico, Thanks for your response, on the note of flash attention not being possible on tpus, does this imply that tpu context size/efficiency will be substantially behind gpus...
Weights are downloading fine for me as well. Set the pre-signed URL in the bash script: ```python PRESIGNED_URL=""" # Set your URL here declare -A N_SHARD_DICT N_SHARD_DICT["7B"]="0" N_SHARD_DICT["13B"]="1" N_SHARD_DICT["30B"]="3" N_SHARD_DICT["65B"]="7"...
I am working on putting together a FLAN dataset as well to upload to the HF hub. Training a 7B and 13B llama model on OIG at bf16 no LORA....
Uploaded these so far for Flan: https://huggingface.co/datasets/conceptofmind/flan_niv2_zsopt https://huggingface.co/datasets/conceptofmind/flan_cot_fsopt https://huggingface.co/datasets/conceptofmind/flan_cot_zsopt https://huggingface.co/datasets/conceptofmind/flan_cot_submix
> Can you try installing via main? Aka: `pip install git+https://github.com/huggingface/accelerate`? > > And ideally can you tell us the output of `accelerate env`? I will try with main right...
> Can you try installing via main? Aka: `pip install git+https://github.com/huggingface/accelerate`? > > And ideally can you tell us the output of `accelerate env`? Looks like this is all set...
Hi @pacman100 , Thank you for the response. I will test out loading and saving the models with FSDP. Best, Enrico
> Yes, if the model is already wrapped in `FullyShardedDataParallel `, `accelerator.prepare` will just return the same Thank you for the immediate response. I currently have the accelerate config as:...
Additionally, I am having an issue when trying to resume from a model checkpoint with `use_orig_params=True`: ```python if CFG.RESUME_FROM_CHECKPOINT is not None or CFG.RESUME_FROM_CHECKPOINT != "": accelerator.print(f"Resuming from checkpoint {CFG.RESUME_FROM_CHECKPOINT}")...
This has not been resolved and the PR addressing it was automatically closed.