text-generation-inference `parameters.stop` only works if the stop sequences align with token boundaries

System Info

TGI Version: huggingface/text-generation-inference:1.0.3 (official docker image) Model: abacaj/starcoderbase-1b-sft OS: Ubuntu 20.04 GPU: RTX 3090 (runpod.io)

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

docker run --gpus all --shm-size 1g -p 8080:80 -v $PWD/data:/data ghcr.io/huggingface/text-generation-inference:1.0.3 --model-id abacaj/starcoderbase-1b-sft

Par works:

curl http://localhost:8080 \
        -X POST \
        -d '{"inputs": "The capital city of France", "parameters":{"stop":["Par"]}}}' \
        -H "Content-Type: application/json"

# [{"generated_text":"The capital city of France is Par"}]

Pari doesn't work:

curl http://localhost:8080 \
        -X POST \
        -d '{"inputs": "The capital city of France", "parameters":{"stop":["Pari"]}}}' \
        -H "Content-Type: application/json"

# [{"generated_text":"The capital city of France is Paris.\n\nYou are given a string array named cities and an integer array named cap"}]

Expected behavior

Other APIs like OpenAI guarantee that the stop sequence text will not appear in the output, regardless of the details of the underlying tokenizer. That's what I expected. So rather than just checking if tokens (or groups of tokens) match any of the stop sequences, it should check against the full recently-generated segment of the output (i.e. at a character 'resolution' rather than token resolution).

I posted this in this closed issue, but I don't think it was noticed.

Sep 20 '23 03:09 josephrocca

Note that this issue isn't specific to abacaj/starcoderbase-1b-sft. I've replicated the issue with Llama 2 as well.

Sep 20 '23 03:09 josephrocca

Bumped into the same thing. I guess there is variety implementations and interpretations for stop sequences. I had to do the same in Python code when manually implementing stop words to remove them from the response.

IDEA: I suppose in this case this could either be altered to not include the stop words in the response or a new option introduced. The latter reminds me of end_sequences option in Cohere's endpoint reference.

The generated text will be cut at the beginning of the earliest occurence of an end sequence. The sequence will be excluded from the text.

Oct 07 '23 20:10 flexchar

@flexchar I like the solution of having two different options, as you've shown there. I don't like the idea of a breaking change to how stop works.

The main thing that I'm actually concerned about here though is the fact that stop doesn't currently work - i.e. I think the first priority is to fix the bug with stop that's causing it to only stop if a sequence aligns with token boundaries.

Oct 08 '23 05:10 josephrocca

I've also been bitten by this.

I'm not an expert, but would it make sense to implement "token healing", as described by the Guidance project here: https://github.com/guidance-ai/guidance/blob/main/notebooks/token_healing.ipynb

I have a suspicion (but no data, of course) that this is a common issue that impacts the quality of model outputs, but just not badly enough that anyone notices enough to fix it.

edit - I just realized this post is describing a slightly different problem, where the stop characters (not tokens) don't align with token boundaries. So a different issue I suppose.

Nov 03 '23 20:11 andysalerno

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Feb 23 '24 01:02 github-actions[bot]

(Not stale.)

Feb 23 '24 03:02 josephrocca

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Mar 25 '24 01:03 github-actions[bot]

.

Mar 25 '24 12:03 josephrocca

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 25 '24 01:04 github-actions[bot]

text-generation-inference text-generation-inference copied to clipboard

`parameters.stop` only works if the stop sequences align with token boundaries

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard