TencentPretrain no GPU usage and only CPU running when inference

Training operates well with GPUs. However when I run inference, no GPU usage and only CPU running... I use following script to do inference.

python3 scripts/generate_lm.py --load_model_path models/llama-7b.bin --spm_model_path $LLaMA_7B_FOLDER/tokenizer.model \
                               --test_path beginning.txt --prediction_path generated_sentence.txt \
                               --config_path models/llama/7b_config.json

Mar 15 '23 10:03 kriskrisliu

working on it

Mar 16 '23 02:03 ydli-ai

yes, only cpu is used. when I used the following code:

torch.cuda.set_device(0) model.to(torch.device("cuda"))

I got the following error: Traceback (most recent call last): File "llama_inference.py", line 110, in output = model(src_tensor, seg_tensor) File "/home/user/miniconda3/envs/llama/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "llama_inference.py", line 38, in forward emb = self.embedding(src, seg).to(torch.device("cuda")) File "/home/user/miniconda3/envs/llama/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/media/user/80EADB63EADB53CE/Felix_Coding_Space/felix_besteasy_code_only/TencentPretrain-main/tencentpretrain/embeddings/embedding.py", line 27, in forward emb = embedding(src, seg) File "/home/user/miniconda3/envs/llama/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/media/user/80EADB63EADB53CE/Felix_Coding_Space/felix_besteasy_code_only/TencentPretrain-main/tencentpretrain/embeddings/word_embedding.py", line 27, in forward emb = self.embedding(src) File "/home/user/miniconda3/envs/llama/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/home/user/miniconda3/envs/llama/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 162, in forward self.norm_type, self.scale_grad_by_freq, self.sparse) File "/home/user/miniconda3/envs/llama/lib/python3.7/site-packages/torch/nn/functional.py", line 2210, in embedding return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper__index_select)

Mar 16 '23 07:03 Felixgithub2017

TencentPretrain TencentPretrain copied to clipboard

no GPU usage and only CPU running when inference

TencentPretrain
TencentPretrain copied to clipboard