LLaVA-NeXT poor quality output for qwen 72b

server:

export CUDA_VISIBLE_DEVICES="3,4,5,6"
python -m sglang.launch_server --model-path lmms-lab/llava-next-72b --tokenizer-path lmms-lab/llavanext-qwen-tokenizer --port=30010 --host="0.0.0.0" --tp-size=4 --random-seed=1234 --context-length=32768 &> 72b.log &

client:

"""
Usage:
# Installing latest llava-next: pip install git+https://github.com/LLaVA-VL/LLaVA-NeXT.git
# Installing latest sglang.

# Endpoint Service CLI:
# python -m sglang.launch_server --model-path lmms-lab/llava-next-72b --tokenizer-path lmms-lab/llavanext-qwen-tokenizer --port=30000 --host="127.0.0.1" --tp-size=4

python3 http_qwen_llava_test.py

Output:
"Two children pose with a large teddy bear, one holding a smaller stuffed bear, in a room with an American flag and potted plants."
"""

import argparse
import asyncio
import json
import time
import copy

import aiohttp
import requests

from llava.conversation import (
    default_conversation,
    conv_templates,
    SeparatorStyle,
    conv_llava_llama_3,
    conv_qwen,
)


async def send_request(url, data, delay=0):
    await asyncio.sleep(delay)
    async with aiohttp.ClientSession() as session:
        async with session.post(url, json=data) as resp:
            output = await resp.json()
    return output


async def test_concurrent(args):
    url = f"{args.host}:{args.port}"

    prompt = "<image>\nPlease generate caption towards this image."
    conv_template = copy.deepcopy(conv_qwen)
    conv_template.append_message(role="user", message=prompt)
    prompt_with_template = conv_template.get_prompt()
    response = []
    for i in range(1):
        response.append(
            send_request(
                url + "/generate",
                {
                    "text": prompt_with_template,
                    "image_data": "https://farm4.staticflickr.com/3175/2653711032_804ff86d81_z.jpg",
                    "sampling_params": {
                        "max_new_tokens": 1024,
                        "temperature": 0,
                        "top_p": 1.0,
                        "presence_penalty": 2,
                        "frequency_penalty": 2,
                        "stop": "<|im_end|>",
                    },
                },
            )
        )

    rets = await asyncio.gather(*response)
    for ret in rets:
        print(ret["text"])


def test_streaming(args):
    url = f"{args.host}:{args.port}"
    prompt = "<image>\nGive detailed information."
    conv_template = copy.deepcopy(conv_qwen)
    conv_template.append_message(role="user", message=prompt)
    prompt_with_template = conv_template.get_prompt()
    pload = {
        "text": prompt_with_template,
        "sampling_params": {
            "max_new_tokens": 1024,
            "temperature": 0,
            "top_p": 1.0,
            "presence_penalty": 2,
            "frequency_penalty": 2,
            "stop": "<|im_end|>",
        },
        #"image_data": "https://farm4.staticflickr.com/3175/2653711032_804ff86d81_z.jpg",
        "image_data": "https://h2o-release.s3.amazonaws.com/h2ogpt/bigben.png",
        "stream": True,
    }
    response = requests.post(
        url + "/generate",
        json=pload,
        stream=True,
    )

    prev = 0
    for chunk in response.iter_lines(decode_unicode=False):
        chunk = chunk.decode("utf-8")
        if chunk and chunk.startswith("data:"):
            if chunk == "data: [DONE]":
                break
            data = json.loads(chunk[5:].strip("\n"))
            output = data["text"].strip()
            print(output[prev:], end="", flush=True)
            prev = len(output)
    print("")


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--host", type=str, default="http://0.0.0.0")
    parser.add_argument("--port", type=int, default=80)
    args = parser.parse_args()
    # asyncio.run(test_concurrent(args))
    test_streaming(args)

just gives:

Big Ben

No matter how I prompt, the output is extremely terse even if accurate.

I changed the image, but otherwise this is the default script from sglang: https://github.com/sgl-project/sglang/blob/main/examples/usage/llava/http_qwen_llava_test.py

If I try increasing temperature to 0.5, I get no response at all and it just fails:

INFO:     172.16.0.42:27134 - "POST /generate HTTP/1.1" 200 OK
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 265, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
    await func()
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
    message = await receive()
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
    await self.message_event.wait()
  File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7dcc7e79ada0

During handling of the above exception, another exception occurred:

  + Exception Group Traceback (most recent call last):
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
  |     result = await app(  # type: ignore[func-returns-value]
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
  |     return await self.app(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
  |     await super().__call__(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
  |     raise exc
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
  |     await self.app(scope, receive, _send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
  |     await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  |     raise exc
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
  |     await self.middleware_stack(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
  |     await route.handle(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
  |     await self.app(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
  |     await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
  |     raise exc
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
  |     await app(scope, receive, sender)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 75, in app
  |     await response(scope, receive, send)
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 258, in __call__
  |     async with anyio.create_task_group() as task_group:
  |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
  |     raise BaseExceptionGroup(
  | exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
    |     await func()
    |   File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
    |     async for chunk in self.body_iterator:
    |   File "/home/ubuntu/sglang/python/sglang/srt/server.py", line 89, in stream_results
    |     async for out in tokenizer_manager.generate_request(obj, request):
    |   File "/home/ubuntu/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 143, in generate_request
    |     pixel_values, image_hash, image_size = await self.get_pixel_values(
    | TypeError: cannot unpack non-iterable NoneType object
    +------------------------------------

I don't understand what it means by the TypeError just because of temperature=0.5, very odd. Is that sglang's fault?

May 25 '24 02:05 pseudotensor

💎 $250 bounty • Tap

Steps to solve:

Start working: Comment /attempt #2708 with your implementation plan
Submit work: Create a pull request including /claim #2708 in the PR body to claim the bounty
Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

❗ Important guidelines:

To claim a bounty, you need to provide a short demo video of your changes in your pull request
If anything is unclear, ask for clarification before starting as this will help avoid potential rework
Low quality AI PRs will not receive review and will be closed
Do not ask to be assigned unless you've contributed before

Thank you for contributing to coollabsio/coolify!

Attempt	Started (UTC)	Solution	Actions
🟢 @LEstradioto	Jun 28, 2024, 12:29:54 AM	#2710	Reward

Jun 27 '24 22:06 algora-pbc[bot]

Couple extra notes - these are out of scope for the most part, but are relevant to my error logs, and may be nice to add a message for:

The OS I'm using being MicroOS is based on Tumbleweed, but Tumbleweed itself is not transactional/immutable/atomic. So if behavior is changed purely based on this, things may misbehave. Tumbleweed itself is just standard OpenSUSE.
Podman will be installed by default in most installs of MicroOS. Installing docker requires transactional-update to be invoked on MicroOS unlike Tumbleweed, so trying to remedy that manually can cause issues. Either the podman socket should be enabled and used, or it should be a hard failure and require the user install Docker themselves. If it can be detected if it's MicroOS vs standard Tumbleweed (I'm sure there's a way, zypper can detect it) then this may be a safe operation to perform. Could also just check zypper's exit code.

Jun 28 '24 00:06 TapGhoul

/attempt #2708

Algora profile	Completed bounties	Tech	Active attempts	Options
@LEstradioto	2 bounties from 2 projects			Cancel attempt

Jun 28 '24 00:06 LEstradioto

Edit: I should’ve looked at the PR first, it does everything I wanted already!!

~~Oh hey! This is pretty much the exact same request as I would make! Unlike the OG poster and adder of the bounty (thank you for that!) I mainly deploy on CoreOS. One thing that I think may be a help, and I don't wanna complicate things too much, but on my read on the install script (I'm not the best at bash so please let me know if I'm wrong) it just attempts to install all the prerequisites. Given that it may actually be kinda difficult to install all the prerequisites in the script, a flag to skip their installation and get right to the setting up of coolify would be ideal as well. As on CoreOS right now, it is failing on line 94, attempting to install git, wget, and jq. (It's doing so because CoreOS does not have DNF or Yum, but uses RPM-ostree, honestly it'd be easier on y'all's end to let us weirdos using immutable systems just make sure the prerequisites are already installed than to try and install them in the script. Or at least that's the path of least resistance I see.~~

Jul 07 '24 02:07 Waffleophagus

This will be implemented by the primary devs in an upcoming release.

May 19 '25 16:05 TapGhoul

LLaVA-NeXT LLaVA-NeXT copied to clipboard

poor quality output for qwen 72b

💎 $250 bounty • Tap

Steps to solve:

❗ Important guidelines:

LLaVA-NeXT
LLaVA-NeXT copied to clipboard