TencentPretrain
TencentPretrain copied to clipboard
no GPU usage and only CPU running when inference
Training operates well with GPUs. However when I run inference, no GPU usage and only CPU running... I use following script to do inference.
python3 scripts/generate_lm.py --load_model_path models/llama-7b.bin --spm_model_path $LLaMA_7B_FOLDER/tokenizer.model \
--test_path beginning.txt --prediction_path generated_sentence.txt \
--config_path models/llama/7b_config.json
working on it
yes, only cpu is used. when I used the following code:
torch.cuda.set_device(0) model.to(torch.device("cuda"))
I got the following error:
Traceback (most recent call last):
File "llama_inference.py", line 110, in