text-generation-inference icon indicating copy to clipboard operation
text-generation-inference copied to clipboard

Weird behavior when using the Helsinki-NLP/opus-mt-en-ar

Open wolfassi123 opened this issue 11 months ago • 1 comments

System Info

Model being used: Helsinki-NLP/opus-mt-en-ar Hardware used: A100 Deployment specificities: Deployed using TGI and pinging the model with the help of the InferenceClient class from huggingface_hub

Information

  • [ ] Docker
  • [ ] The CLI directly

Tasks

  • [ ] An officially supported command
  • [ ] My own modifications

Reproduction

from huggingface_hub import InferenceClient
client = InferenceClient(f"http://{url}:{port}")
results = client.text_generation(
    ">>ara<< We will be going tomorrow ",
    )
print(results)

Expected behavior

I deployed the model using TGI with the following command: docker run --name en_ar_translation --gpus device=0 --shm-size 1g -p 1111:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.4 --model-id $model --trust-remote-code --max-input-length 20 --max-total-tokens 128

The docker container went up, but whenever I ping the model, it is returning a random word repeated over and over. When I deploy the model or test it using the Pipeline library, the model performs correctly and everything is good.

I know the architecture is not supported by TGI, but according to the documentation, I should be unable to shard the model and to use Flash Attention, would it also cause such a huge performance drop?

wolfassi123 avatar Mar 15 '24 13:03 wolfassi123

I just noticed that all sort of summarization tasks do not work with TGI. You are unable to deploy a summarization model using TGI. If you do deploy a summarization model using TGI, the output will basically be gibberish. To solve it, you would need to do the following:

client = InferenceClient(model="http://0.0.0.0:8000") client.summarization( text="TEST" model="facebook/bart-large-cnn")

You would need to pass the model that you want to use. This downloads the model locally rendering the container that was deployed using TGI uselss.

Is summarization not supported by TGI?

wolfassi123 avatar Mar 19 '24 12:03 wolfassi123

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] avatar Apr 19 '24 01:04 github-actions[bot]