sujithjoseph

Results 28 comments of sujithjoseph

If I shard the xxl base model like this ``` model.save_pretrained("sharded", max_shard_size="2000MB") ``` will it help in then finetuning it with larger batch size or should I load it int-8...

Since I have CUDA 11.6 driver installed (vertex ai), I was using torch 1.12.1+cu116 . During installation, I see this ``` ERROR: pip's dependency resolver does not currently take into...

@pacman100 , I am not able to import prepare_model_for_training from main. I did pip install -U git+https://github.com/huggingface/peft.git. Should I install this branch - https://github.com/huggingface/peft/tree/younesbelkada-flan-t5-xl ? ImportError: cannot import name 'prepare_model_for_training'...

pip install --upgrade -e git+https://github.com/huggingface/peft.git#egg=peft pip install --upgrade git+https://github.com/huggingface/peft.git This helped to fix it.

```from time import time model.eval() inputs = tokenizer(f'Explain Artificial Intelligence ', return_tensors="pt") print(inputs) times = [] #in ms for i in range(100): with torch.no_grad(): #with torch.cuda.amp.autocast(): start = time() outputs...

``` from time import time model.eval() inputs = tokenizer(f'Explain Artificial Intelligence ', return_tensors="pt") print(inputs) times = [] #in ms for i in range(100): with torch.no_grad(): #with torch.cuda.amp.autocast(): start = time()...

This only happens when i load the model in 8-bit alone. ``` config = PeftConfig.from_pretrained(peft_model_id) model = AutoModelForSeq2SeqLM.from_pretrained(config.base_model_name_or_path, device_map={'':0}, load_in_8bit=True,torch_dtype=torch.float16) device = torch.device("cuda") model.cuda() model = prepare_model_for_training(model) model = PeftModel.from_pretrained(model,...

> @sujithjoseph, what is the DeepSpeed version being used? PEFT require v0.8.0 as it has resolved bug related to training when lot of params are frozen. @pacman100 deepspeed==0.8.0

> Also, may I know what is the input and output seq lengths of the dataset? > > In my experiment on a summarization task using PEFT+DS_Z3 with 4 A100...

Thanks a lot , @pacman100 ! This is awesome! I will reduce max length for input seq length. I am trying to see if I can pass a Q and...