DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

[BUG] A process bug occurred when I tried to use multiple card inference to loop through the prompt words entered by the terminalg

Open hardlipay opened this issue 1 year ago • 0 comments

** My Code Follow the official documentation:https://www.deepspeed.ai/tutorials/inference-tutorial/

import os
import torch
import deepspeed
import transofrmers

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

transofrmers.logging.set_verbosity_error()
os.environ['CUDA_VISIBLE_DEVICES'] = '0,1'

local_rank = int(os.environ['LOCAL_RANK','0'])
world_size = int(os.environ['WORLD_SIZE','1'])

MODEL_NAMEorPATH = 'path/to/model'
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAMEorPATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAMEorPATH)

generator = pipeline('text-generation', model=model, tokenizer=tokenizer, max_new_tokens=128, device=local_rank)

generator.model = deepspeed.init_inference(generator.model, mp_size=world_size, dtype=torch.float16, replace_with_kernel_inject=True)

while True:
    prompt = input("Model prompt >>> ")
    if prompt == 'quit':
        break
    print(generator(prompt)[0]['generated_text'])

I use the command in the terminal:deepspeed --num_gpus 2 mycode.py An error occurred, first of all, "Model prompt >>>" was output twice, I checked the related information and code, I learned that when deepspeed uses multi-card infer, it will enable multiple processes, so it will output two, at this time I need to enter prompt twice in the terminal to continue the execution, but there is no output, at this time, both graphics cards are 100%, and the memory does not change. The video memory is also not changing, how should I properly use multi-card multi-thread to properly loop through multiple passes of promp and the correct infer and then get the infer result for each one.

hardlipay avatar May 06 '23 06:05 hardlipay