DHS-LLM-Workshop
DHS-LLM-Workshop copied to clipboard
DHS 2023 LLM Workshop by Sourab Mangrulkar
Below is the notebook link from your blog - https://huggingface.co/blog/personal-copilot https://colab.research.google.com/drive/1Tz9KKgacppA4S6H4eo_sw43qEaC9lFLs?usp=sharing ``` !git pull !python train.py \ --model_name_or_path "bigcode/starcoder" \ --dataset_name "smangrul/hf-stack-v1" \ --subset "data" \ --data_column "content" \ --splits...
Hi dear: I finetuned with accelerate with fsdp, but i do not know how to load checkpoint to do inference, checkpoint output is as below: checkpoin-100 - optomizer_0 - __0_0.distcp...
Hello author, Thanks for your tutorial. I am using the dataset hf-codegen-v2 which has 370k rows. The validation set is about 1850. The batch size is 4. For other params,...
When I am trying to train a model with FSDP, I am getting following error. *** TypeError: isinstance() arg 2 must be a type, a tuple of types, or a...
Hi, I am getting a CUDA out of memory error when I try to run the chat_assistant training's run_fsdp.sh script on a 34b model. Changing the model from 7b to...
python train.py \ --model_path "bigcode/starcoderbase-1b" \ --dataset_name "smangrul/hf-stack-v1" \ --subset "data" \ --data_column "content" \ --split "train" \ --seq_length 2048 \ --max_steps 2000 \ --batch_size 1 \ --gradient_accumulation_steps 1 \...
in https://github.com/pacman100/DHS-LLM-Workshop/blob/main/chat_assistant/training/utils.py#L182C9-L182C19, what is the reason to set device_map = 'auto' When I run it with accelerator (with fsdp) I got the error ```bash ValueError: You can't train a model...
I am currently using the FSDP (Fully Sharded Data Parallelism) approach with the Llama 2 70B model. The training process has begun, but I encounter an error when attempting to...
Hi @pacman100 , Firstly, thank you for the well-detailed article! I am writing to provide some feedback and seek clarification. 1. **Optimizer Selection:** - The blog post demonstrates the use...
Thanks for your educational blog post and this repo. Could you please provide your scripts to finetune the 70B model in this repo? BTW, when I run your 7B finetune...