LLaVA-NeXT
LLaVA-NeXT copied to clipboard
poor quality output for qwen 72b
server:
export CUDA_VISIBLE_DEVICES="3,4,5,6"
python -m sglang.launch_server --model-path lmms-lab/llava-next-72b --tokenizer-path lmms-lab/llavanext-qwen-tokenizer --port=30010 --host="0.0.0.0" --tp-size=4 --random-seed=1234 --context-length=32768 &> 72b.log &
client:
"""
Usage:
# Installing latest llava-next: pip install git+https://github.com/LLaVA-VL/LLaVA-NeXT.git
# Installing latest sglang.
# Endpoint Service CLI:
# python -m sglang.launch_server --model-path lmms-lab/llava-next-72b --tokenizer-path lmms-lab/llavanext-qwen-tokenizer --port=30000 --host="127.0.0.1" --tp-size=4
python3 http_qwen_llava_test.py
Output:
"Two children pose with a large teddy bear, one holding a smaller stuffed bear, in a room with an American flag and potted plants."
"""
import argparse
import asyncio
import json
import time
import copy
import aiohttp
import requests
from llava.conversation import (
default_conversation,
conv_templates,
SeparatorStyle,
conv_llava_llama_3,
conv_qwen,
)
async def send_request(url, data, delay=0):
await asyncio.sleep(delay)
async with aiohttp.ClientSession() as session:
async with session.post(url, json=data) as resp:
output = await resp.json()
return output
async def test_concurrent(args):
url = f"{args.host}:{args.port}"
prompt = "<image>\nPlease generate caption towards this image."
conv_template = copy.deepcopy(conv_qwen)
conv_template.append_message(role="user", message=prompt)
prompt_with_template = conv_template.get_prompt()
response = []
for i in range(1):
response.append(
send_request(
url + "/generate",
{
"text": prompt_with_template,
"image_data": "https://farm4.staticflickr.com/3175/2653711032_804ff86d81_z.jpg",
"sampling_params": {
"max_new_tokens": 1024,
"temperature": 0,
"top_p": 1.0,
"presence_penalty": 2,
"frequency_penalty": 2,
"stop": "<|im_end|>",
},
},
)
)
rets = await asyncio.gather(*response)
for ret in rets:
print(ret["text"])
def test_streaming(args):
url = f"{args.host}:{args.port}"
prompt = "<image>\nGive detailed information."
conv_template = copy.deepcopy(conv_qwen)
conv_template.append_message(role="user", message=prompt)
prompt_with_template = conv_template.get_prompt()
pload = {
"text": prompt_with_template,
"sampling_params": {
"max_new_tokens": 1024,
"temperature": 0,
"top_p": 1.0,
"presence_penalty": 2,
"frequency_penalty": 2,
"stop": "<|im_end|>",
},
#"image_data": "https://farm4.staticflickr.com/3175/2653711032_804ff86d81_z.jpg",
"image_data": "https://h2o-release.s3.amazonaws.com/h2ogpt/bigben.png",
"stream": True,
}
response = requests.post(
url + "/generate",
json=pload,
stream=True,
)
prev = 0
for chunk in response.iter_lines(decode_unicode=False):
chunk = chunk.decode("utf-8")
if chunk and chunk.startswith("data:"):
if chunk == "data: [DONE]":
break
data = json.loads(chunk[5:].strip("\n"))
output = data["text"].strip()
print(output[prev:], end="", flush=True)
prev = len(output)
print("")
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--host", type=str, default="http://0.0.0.0")
parser.add_argument("--port", type=int, default=80)
args = parser.parse_args()
# asyncio.run(test_concurrent(args))
test_streaming(args)
just gives:
Big Ben
No matter how I prompt, the output is extremely terse even if accurate.
I changed the image, but otherwise this is the default script from sglang: https://github.com/sgl-project/sglang/blob/main/examples/usage/llava/http_qwen_llava_test.py
If I try increasing temperature to 0.5, I get no response at all and it just fails:
INFO: 172.16.0.42:27134 - "POST /generate HTTP/1.1" 200 OK
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 265, in __call__
await wrap(partial(self.listen_for_disconnect, receive))
File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
await func()
File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 238, in listen_for_disconnect
message = await receive()
File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 568, in receive
await self.message_event.wait()
File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/asyncio/locks.py", line 214, in wait
await fut
asyncio.exceptions.CancelledError: Cancelled by cancel scope 7dcc7e79ada0
During handling of the above exception, another exception occurred:
+ Exception Group Traceback (most recent call last):
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi
| result = await app( # type: ignore[func-returns-value]
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__
| return await self.app(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__
| await super().__call__(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/middleware/errors.py", line 186, in __call__
| raise exc
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__
| await self.app(scope, receive, _send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__
| await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__
| await self.middleware_stack(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 776, in app
| await route.handle(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle
| await self.app(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 77, in app
| await wrap_app_handling_exceptions(app, request)(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
| raise exc
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
| await app(scope, receive, sender)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/routing.py", line 75, in app
| await response(scope, receive, send)
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 258, in __call__
| async with anyio.create_task_group() as task_group:
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 678, in __aexit__
| raise BaseExceptionGroup(
| exceptiongroup.ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
+-+---------------- 1 ----------------
+-+---------------- 1 ----------------
| Traceback (most recent call last):
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/home/ubuntu/miniconda3/envs/sglang/lib/python3.10/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/home/ubuntu/sglang/python/sglang/srt/server.py", line 89, in stream_results
| async for out in tokenizer_manager.generate_request(obj, request):
| File "/home/ubuntu/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 143, in generate_request
| pixel_values, image_hash, image_size = await self.get_pixel_values(
| TypeError: cannot unpack non-iterable NoneType object
+------------------------------------
I don't understand what it means by the TypeError just because of temperature=0.5, very odd. Is that sglang's fault?
💎 $250 bounty • Tap
Steps to solve:
- Start working: Comment
/attempt #2708with your implementation plan - Submit work: Create a pull request including
/claim #2708in the PR body to claim the bounty - Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts
❗ Important guidelines:
- To claim a bounty, you need to provide a short demo video of your changes in your pull request
- If anything is unclear, ask for clarification before starting as this will help avoid potential rework
- Low quality AI PRs will not receive review and will be closed
- Do not ask to be assigned unless you've contributed before
Thank you for contributing to coollabsio/coolify!
| Attempt | Started (UTC) | Solution | Actions |
|---|---|---|---|
| 🟢 @LEstradioto | Jun 28, 2024, 12:29:54 AM | #2710 | Reward |
Couple extra notes - these are out of scope for the most part, but are relevant to my error logs, and may be nice to add a message for:
- The OS I'm using being MicroOS is based on Tumbleweed, but Tumbleweed itself is not transactional/immutable/atomic. So if behavior is changed purely based on this, things may misbehave. Tumbleweed itself is just standard OpenSUSE.
- Podman will be installed by default in most installs of MicroOS. Installing docker requires
transactional-updateto be invoked on MicroOS unlike Tumbleweed, so trying to remedy that manually can cause issues. Either the podman socket should be enabled and used, or it should be a hard failure and require the user install Docker themselves. If it can be detected if it's MicroOS vs standard Tumbleweed (I'm sure there's a way, zypper can detect it) then this may be a safe operation to perform. Could also just check zypper's exit code.
/attempt #2708
| Algora profile | Completed bounties | Tech | Active attempts | Options |
|---|---|---|---|---|
| @LEstradioto | 2 bounties from 2 projects | Cancel attempt |
Edit: I should’ve looked at the PR first, it does everything I wanted already!!
~~Oh hey! This is pretty much the exact same request as I would make! Unlike the OG poster and adder of the bounty (thank you for that!) I mainly deploy on CoreOS. One thing that I think may be a help, and I don't wanna complicate things too much, but on my read on the install script (I'm not the best at bash so please let me know if I'm wrong) it just attempts to install all the prerequisites. Given that it may actually be kinda difficult to install all the prerequisites in the script, a flag to skip their installation and get right to the setting up of coolify would be ideal as well. As on CoreOS right now, it is failing on line 94, attempting to install git, wget, and jq. (It's doing so because CoreOS does not have DNF or Yum, but uses RPM-ostree, honestly it'd be easier on y'all's end to let us weirdos using immutable systems just make sure the prerequisites are already installed than to try and install them in the script. Or at least that's the path of least resistance I see.~~
This will be implemented by the primary devs in an upcoming release.