Finetune_LLMs issues

RuntimeError: The expanded size of the tensor (50257) must match the existing size (0) at non-singleton dimension 0. Target sizes: [50257]. Tensor sizes: [0]

Using --model_name_or_path hivemind/gpt-j-6B-8bit RuntimeError: The expanded size of the tensor (50257) must match the existing size (0) at non-singleton dimension 0. Target sizes: [50257]. Tensor sizes: [0]

CrackerHax

`RuntimeError: Error building extension 'cpu_adam'AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

I can't figure out how to fix this error. I am trying to run the example run.txt from here https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B/blob/main/finetuning_repo/example_run.txt I run it and get this error, it has an...

mycelium-networks

Incorrect block size?

2

In your example_run.txt command line example for deepspeed, should "--block_size 2048" perhaps be set? Without this, it looks like it's picking up the GPT2 default of 1024, but GPT-J rather...

jdwx

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 394.00 MiB

3

Hello, I am trying to finetune GPT-j-6b. I followed the instructions provided in the documentation. But, I get this error. I tried by changing batch size =1, gradient_accumulation_steps=4. Any idea...

shifu-learner

gradient overflow when training 13b Llama Model on 7 a100s

1

![image](https://user-images.githubusercontent.com/47894192/236276560-049a0013-0937-4891-a433-1bd61f5863a1.png) Getting gradient overflow and skipped step every 2 or so steps. Training the 13b llama model on 7 a100s with context window of 512. Below is the command line...

awrd2019

Finetune_LLMs
Finetune_LLMs copied to clipboard

Metadata

RuntimeError: The expanded size of the tensor (50257) must match the existing size (0) at non-singleton dimension 0. Target sizes: [50257]. Tensor sizes: [0]

`RuntimeError: Error building extension 'cpu_adam'AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

Incorrect block size?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 394.00 MiB

gradient overflow when training 13b Llama Model on 7 a100s

← Metadata

Owner

Metadata

Finetune_LLMs Finetune_LLMs copied to clipboard

Metadata

RuntimeError: The expanded size of the tensor (50257) must match the existing size (0) at non-singleton dimension 0. Target sizes: [50257]. Tensor sizes: [0]

`RuntimeError: Error building extension 'cpu_adam'AttributeError: 'DeepSpeedCPUAdam' object has no attribute 'ds_opt_adam'

Incorrect block size?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 394.00 MiB

gradient overflow when training 13b Llama Model on 7 a100s

← Metadata

Owner

Metadata

Finetune_LLMs
Finetune_LLMs copied to clipboard