DHS-LLM-Workshop issues

Fine Tuning with LoRA failed during train step

Below is the notebook link from your blog - https://huggingface.co/blog/personal-copilot https://colab.research.google.com/drive/1Tz9KKgacppA4S6H4eo_sw43qEaC9lFLs?usp=sharing ``` !git pull !python train.py \ --model_name_or_path "bigcode/starcoder" \ --dataset_name "smangrul/hf-stack-v1" \ --subset "data" \ --data_column "content" \ --splits...

arjun-mavonic

how to load model trained by accelerate with fsdp.

Hi dear: I finetuned with accelerate with fsdp, but i do not know how to load checkpoint to do inference, checkpoint output is as below: checkpoin-100 - optomizer_0 - __0_0.distcp...

shatealaboxiaowang

Eval is like running forever

1

Hello author, Thanks for your tutorial. I am using the dataset hf-codegen-v2 which has 370k rows. The validation set is about 1850. The batch size is 4. For other params,...

d5423197

Problem training with FSDP

2

When I am trying to train a model with FSDP, I am getting following error. *** TypeError: isinstance() arg 2 must be a type, a tuple of types, or a...

agokrani

chat assistant traning: CUDA out of memory

Hi, I am getting a CUDA out of memory error when I try to run the chat_assistant training's run_fsdp.sh script on a 34b model. Changing the model from 7b to...

stevenhao

train Segmentation fault

python train.py \ --model_path "bigcode/starcoderbase-1b" \ --dataset_name "smangrul/hf-stack-v1" \ --subset "data" \ --data_column "content" \ --split "train" \ --seq_length 2048 \ --max_steps 2000 \ --batch_size 1 \ --gradient_accumulation_steps 1 \...

bravelll

Using device_map auto when launch acceleator

in https://github.com/pacman100/DHS-LLM-Workshop/blob/main/chat_assistant/training/utils.py#L182C9-L182C19, what is the reason to set device_map = 'auto' When I run it with accelerator (with fsdp) I got the error ```bash ValueError: You can't train a model...

ronyadgar

Error on save_steps using FSDP

3

I am currently using the FSDP (Fully Sharded Data Parallelism) approach with the Llama 2 70B model. The training process has begun, but I encounter an error when attempting to...

ghost

Enhancements for Efficient Utilization and Optimization in Fine-tuning Llama 2 70B Example

Hi @pacman100 , Firstly, thank you for the well-detailed article! I am writing to provide some feedback and seek clarification. 1. **Optimizer Selection:** - The blog post demonstrates the use...

adamlin120

Finetune 70B model on one node

Thanks for your educational blog post and this repo. Could you please provide your scripts to finetune the 70B model in this repo? BTW, when I run your 7B finetune...

yuanenming

DHS-LLM-Workshop
DHS-LLM-Workshop copied to clipboard

Metadata

Fine Tuning with LoRA failed during train step

how to load model trained by accelerate with fsdp.

Eval is like running forever

Problem training with FSDP

chat assistant traning: CUDA out of memory

train Segmentation fault

Using device_map auto when launch acceleator

Error on save_steps using FSDP

Enhancements for Efficient Utilization and Optimization in Fine-tuning Llama 2 70B Example

Finetune 70B model on one node

← Metadata

Owner

Metadata

DHS-LLM-Workshop DHS-LLM-Workshop copied to clipboard

Metadata

← Metadata

Owner

Metadata

DHS-LLM-Workshop
DHS-LLM-Workshop copied to clipboard