fastertransformer_backend
fastertransformer_backend copied to clipboard
GPT-J streaming: getting garbage response
Description
branch: main
fastertransformer docker: 22.12
Reproduced Steps
docker run -it --rm --gpus=all --shm-size=1g --ulimit memlock=-1 -v ${WORKSPACE}:${WORKSPACE} -w ${WORKSPACE} ${TRITON_DOCKER_IMAGE} bash
# now in docker
export WORKSPACE=$(pwd)
export SRC_MODELS_DIR=${WORKSPACE}/models
git clone https://gitlab-master.nvidia.com/dl/FasterTransformer/FasterTransformer.git # Used for convert the checkpoint and triton output
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-vocab.json -P models
wget https://s3.amazonaws.com/models.huggingface.co/bert/gpt2-merges.txt -P models
wget https://mystic.the-eye.eu/public/AI/GPT-J-6B/step_383500_slim.tar.zstd
mkdir ${SRC_MODELS_DIR}/gptj/ -p
tar -axf step_383500_slim.tar.gz -C ${SRC_MODELS_DIR}/gptj/
pip install scipy
python3 ${WORKSPACE}/FasterTransformer/examples/pytorch/gptj/utils/gptj_ckpt_convert.py \
--output-dir ${WORKSPACE}/all_models/gptj/fastertransformer/1 \
--ckpt-dir ${SRC_MODELS_DIR}/gptj/step_383500/ \
--n-inference-gpus 2
Enabled Decoupled mode in config.pbtx
Streaming is working but the response is garbage and context is missing from the text. The model is working fine if not use streaming, Is there any special step or parameter missing causing the issue in streaming?
@byshiue
Please provide the scripts about how you run the streaming on GPT-J.