DeepSpeed
DeepSpeed copied to clipboard
[BUG] A process bug occurred when I tried to use multiple card inference to loop through the prompt words entered by the terminalg
** My Code Follow the official documentation:https://www.deepspeed.ai/tutorials/inference-tutorial/
import os
import torch
import deepspeed
import transofrmers
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
transofrmers.logging.set_verbosity_error()
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'
local_rank = int(os.environ['LOCAL_RANK','0'])
world_size = int(os.environ['WORLD_SIZE','1'])
MODEL_NAMEorPATH = 'path/to/model'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAMEorPATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAMEorPATH)
generator = pipeline('text-generation', model=model, tokenizer=tokenizer, max_new_tokens=128, device=local_rank)
generator.model = deepspeed.init_inference(generator.model, mp_size=world_size, dtype=torch.float16, replace_with_kernel_inject=True)
while True:
prompt = input("Model prompt >>> ")
if prompt == 'quit':
break
print(generator(prompt)[0]['generated_text'])
I use the command in the terminal:deepspeed --num_gpus 2 mycode.py An error occurred, first of all, "Model prompt >>>" was output twice, I checked the related information and code, I learned that when deepspeed uses multi-card infer, it will enable multiple processes, so it will output two, at this time I need to enter prompt twice in the terminal to continue the execution, but there is no output, at this time, both graphics cards are 100%, and the memory does not change. The video memory is also not changing, how should I properly use multi-card multi-thread to properly loop through multiple passes of promp and the correct infer and then get the infer result for each one.