text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

falcon-7b-instruct model unexpected text generation without flash attention

Open chironito opened this issue 2 years ago • 0 comments

System Info

Version: ghcr.io/huggingface/text-generation-inference:latest OS: Ubuntu 22.04 LTS GPU: 1 x A100 80GB GPU on azure 2023-07-04_14-51-20

Information

  • [X] Docker
  • [ ] The CLI directly

Tasks

  • [X] An officially supported command
  • [ ] My own modifications

Reproduction

sudo docker run --gpus all -p 8080:80 -v /mnt/ext/data:/data -e USE_FLASH_ATTENTION=FALSE ghcr.io/huggingface/text-generation-inference:latest --model-id tiiuae/falcon-7b-instruct --trust-remote-code 2023-07-04_14-29-53

Expected behavior

sudo docker run --gpus all -p 8080:80 -v /mnt/ext/data:/data -e USE_FLASH_ATTENTION=TRUE ghcr.io/huggingface/text-generation-inference:latest --model-id tiiuae/falcon-7b-instruct --trust-remote-code

If flash attention is switched on, we get correct generation...

2023-07-04_14-36-53

chironito avatar Jul 04 '23 09:07 chironito