GPT-J streaming: getting garbage response

Open vax-dev opened this issue 2 years ago • 1 comments

Description

branch: main
fastertransformer docker: 22.12

Reproduced Steps

docker run -it --rm --gpus=all --shm-size=1g --ulimit memlock=-1 -v ${WORKSPACE}:${WORKSPACE} -w ${WORKSPACE} ${TRITON_DOCKER_IMAGE} bash
# now in docker

export WORKSPACE=$(pwd)
export SRC_MODELS_DIR=${WORKSPACE}/models
git clone https://gitlab-master.nvidia.com/dl/FasterTransformer/FasterTransformer.git # Used for convert the checkpoint and triton output
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json -P models
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt -P models
wget https://mystic.the-eye.eu/public/AI/GPT-J-6B/step_383500_slim.tar.zstd
mkdir ${SRC_MODELS_DIR}/gptj/ -p
tar -axf step_383500_slim.tar.gz -C ${SRC_MODELS_DIR}/gptj/
pip install scipy
python3 ${WORKSPACE}/FasterTransformer/examples/pytorch/gptj/utils/gptj_ckpt_convert.py \
        --output-dir ${WORKSPACE}/all_models/gptj/fastertransformer/1 \
        --ckpt-dir ${SRC_MODELS_DIR}/gptj/step_383500/ \
        --n-inference-gpus 2

Enabled Decoupled mode in config.pbtx

Streaming is working but the response is garbage and context is missing from the text. The model is working fine if not use streaming, Is there any special step or parameter missing causing the issue in streaming?

@byshiue

Feb 09 '23 06:02 vax-dev

Please provide the scripts about how you run the streaming on GPT-J.

Feb 09 '23 07:02 byshiue