vllm Weird beam search outputs

While playing with it I've stumbled upon strange behavior that might indicate that there is some issue when the beam search is used. I've started the server with: python3 -m vllm.entrypoints.api_server --model mosaicml/mpt-30b

When I request:

curl http://vllm.ai/generate \
    -d '{
        "prompt": "San Francisco is a",
        "max_tokens":64,
        "temperature": 0,
        "n":1
    }'

I get more or less expected answer:

{"text": ["San Francisco is a city of neighborhoods, and each has its own character. The following is a brief description of the most popular areas.\n\n\u2022 **Downtown** (also called SoMa, for South of Market) is the city's financial district, with a few hotels, restaurants, and shops.\n\n\u2022 **Union Square"]}

However when I use beam_search:

curl http://vllm.ai/generate \
    -d '{
        "prompt": "San Francisco is a",
        "max_tokens":64,
        "use_beam_search": true,
        "temperature": 0,
        "n":4
    }'

I get:

{"text": [
  "San Francisco is a very city, visit each with its own personality and and The\n\n###  Fisherman's Wharf**Fisherman's Wharf is the city's is the  | ###  Top Sights  | ###  Sights  | ###  Eating  | ###  Drinking & Night",
   "San Francisco is a great place to live, but it's character and flavor. TheTheTheFisherman's Wharf\n\nThe city\u00bb  Fisherman's Wharf  is the city's most aSights  | ###  Activities  | ###  Courses  | ###  Festivals  |",
  "San Francisco is a big that for be a and and play distinct character.\n\n###  Neighborhoods at a Glance\n\n###  Name  | ###  Character\n\n---|---\n\nDowntown &\n\n---|---|---\n\nDowntown\n\n---|---|---|---Eating  | ###  Drinking & Nightlife",
 "San Francisco is a city of neighborhoods, and work, and play. charm. The\u2022\u2022 **Downtown in Brief & the Piers\n\nThe, San Francisco's most(Click here ) is theThe Marina  | The city'sEating  | ###  Drinking & Nightlife  | ###  Entertainment"
]}

I'm not sure but it looks like the answers are corrupted ot intermingled after certain number of tokens (like cumming from different answers?).

Interestingly enough the problem manifest only with n>2. I've tested for n=3, n=4 and n=5, for n=2 it looks correct:

{"text": [
"San Francisco is a great place to live, but it's not a great place to work. It's a city that's beautiful culture, its beautiful architecture, and its many attractions. Whether you're looking for a fun day trip or a longer stay, San Francisco has something for everyone. Here are some of the best things to do in", 
"San Francisco is a city of for be a and it is also a great place to visit. The city is known for its diverse neighborhoods, its unique architecture, and its beautiful natural setting. There are visiting to explore the city's many neighborhoods, or you're just looking for to offer everyone\nThe just a few of the many things to"
]}

Originally posted by @emsi in https://github.com/vllm-project/vllm/issues/334#issuecomment-1619134531