codellama RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Open fhlkm opened this issue 5 months ago • 3 comments

I am using wsl 2 with ubuntu-22.04, this is the gpu information

when i run "sudo lshw -C display"

I install torch using this command pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113

when i run command "torchrun --nproc_per_node 1 example_instructions.py
--ckpt_dir CodeLlama-7b-Instruct/
--tokenizer_path CodeLlama-7b-Instruct/tokenizer.model
--max_seq_len 512 --max_batch_size 4",

it has error: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'.

These are full log:


:/mnt/c/Users/john.john/codelama/weight/codellama-main$ torchrun --nproc_per_node 1 example_instructions.py \
kpt_dir>     --ckpt_dir CodeLlama-7b-Instruct/ \
>     --tokenizer_path CodeLlama-7b-Instruct/tokenizer.model \
>     --max_seq_len 512 --max_batch_size 4
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
Loaded in 38.10 seconds
Traceback (most recent call last):
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 68, in <module>
    fire.Fire(main)
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/john/.local/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/example_instructions.py", line 51, in main
    results = generator.chat_completion(
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 351, in chat_completion
    generation_tokens, generation_logprobs = self.generate(
  File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/generation.py", line 164, in generate
    logits = self.model.forward(tokens[:, prev_pos:cur_pos], prev_pos)
  File "/home/john/.local/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 300, in forward
    h = layer(h, start_pos, freqs_cis, (mask.to(device) if mask is not None else mask))
  File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 252, in forward
    h = x + self.attention.forward(
  File "/mnt/c/Users/john.john/codelama/weight/codellama-main/llama/model.py", line 165, in forward
    xq, xk, xv = self.wq(x), self.wk(x), self.wv(x)
  File "/home/john/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/john/.local/lib/python3.10/site-packages/fairscale/nn/model_parallel/layers.py", line 290, in forward
    output_parallel = F.linear(input_parallel, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 433) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/home/john/.local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 761, in main
    run(args)
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 752, in run
    elastic_launch(
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 131, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/john/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 245, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

example_instructions.py FAILED

Failures:
  <NO_OTHER_FAILURES>

Root Cause (first observed failure):
[0]:
  time      : 2024-01-24_16:51:25
  host      : company
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 433)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Jan 25 '24 01:01 fhlkm

I am getting the same error, but in CPU mode. It looks like your model is running on CPU as well.

Jan 30 '24 02:01 akashdhruv

hi @akashdhruv , My pc has nvidia GPU, please check above screenshot.

It is suppose to run on GPU. do you know why it only runs on CPU?

Jan 30 '24 18:01 fhlkm

hi @akashdhruv , My pc has nvidia GPU, please check above screenshot.

It is suppose to run on GPU. do you know why it only runs on CPU?

I think you need to look into your system and torchrun configuration to figure out why GPU is not being identified. Is your PyTorch installed with GPU support? If yes maybe try,

export CUDA_VISIBLE_DEVICES="0"

Jan 30 '24 19:01 akashdhruv

codellama codellama copied to clipboard

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

codellama
codellama copied to clipboard