llama-cpp-python
llama-cpp-python copied to clipboard
Unable to disable "clip_model_load" log messages
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [X] I carefully followed the README.md.
- [X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
verbose=false passed to Llama should disable log messages for llama_cpp
Current Behavior
Log messages are leaking through from an underlying llama_chat_format.
clip_model_load: loaded meta data with 18 key-value pairs and 377 tensors from models/llava/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv 0: general.architecture str = clip
...
Environment and Context
M1 Pro - MacBook Pro
$ python3 --version
Python 3.10.13
$ make --version
GNU Make 3.81
$ g++ --version
Apple clang version 15.0.0 (clang-1500.1.0.2.5)
Failure Information (for bugs)
I believe verbose should suppress these log messages.
Steps to Reproduce
Use the following code:
from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava15ChatHandler
import logging
logger = logging.getLogger('llama_cpp.llama_chat_format')
logger.disabled = True
def load_llm():
chat_handler = Llava15ChatHandler(clip_model_path="./models/llava/mmproj-model-f16.gguf")
llm = Llama(
model_path="./models/llava/ggml-model-q5_k.gguf",
chat_handler=chat_handler,
verbose=False,
n_ctx=1024, # n_ctx should be increased to accommodate the image embedding
logits_all=True, # needed to make llava work
)
return llm
def run(input, llm):
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are an assistant who perfectly describes images."},
{
"role": "user",
"content": [
input,
{
"type": "text",
"text": "Describe this image in detail please."
}
]
}
]
)
return response['choices'][0]['message']['content']
if __name__=='__main__':
input = {
"type": "image_url",
"image_url": {"url": "https://thumbor.forbes.com/thumbor/fit-in/900x510/https://www.forbes.com/advisor/wp-content/uploads/2023/07/top-20-small-dog-breeds.jpeg.jpg"}
}
llm = load_llm()
response = run(input, llm).strip()
print(response)
Maybe it's due to Llava15ChatHandler's __init__ method calling with suppress_stdout_stderr(disable=self.verbose). With verbose=False it also means that disable=False.
@abetlen I could open a PR if that helps.
Maybe it's due to
Llava15ChatHandler's__init__method callingwith suppress_stdout_stderr(disable=self.verbose).
With
verbose=Falseit also means thatdisable=False. @abetlen I could open a PR if that helps.
I thought so too but I took another glance at this just now and I think the way this is set up is confusing.
disable = verbose = True -> disable the suppression, make the output more verbose disable = verbose = False -> enable the suppression, make the output less verbose
It's sort of a double negative.
# Oddly enough this works better than the contextlib version
def __enter__(self):
if self.disable:
return self
...
I wasn't explicitly setting verbose previously in the Llava15ChatHandler as the default was verbose = false, so I decided to try varying it and it seems regardless of what self.verbose is set to the log lines are printed, just at different times.
I patched it locally to include log messages for the verbose setting just to be sure:
If verbose is set to true:
__init__ verbose: True
clip_model_load: loaded meta data with 18 key-value pairs and 377 tensors from ./models/llava/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv 0: general.architecture str = clip
clip_model_load: - kv 1: clip.has_text_encoder bool = false
...
__init__: completed
If verbose is set to false:
__init__ verbose: False
__init__: completed
Generated: LLAVAResult(id=0, image_url='https://justinrmiller.github.io/assets/photo-gallery/eddie.jpg', generated_text='The image features a brown dog lying on the floor, resting its head on a pillow or blanket. The dog appears to be relaxed and comfortable as it lays down. There are several books scattered around the room, with some placed near the top left corner of the scene and others closer to the bottom right side. Additionally, there is a chair located in the upper right part of the image.', generation_time=19.80641816696152)
__del__ verbose: False
__del__: completed
clip_model_load: loaded meta data with 18 key-value pairs and 377 tensors from ./models/llava/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv 0: general.architecture str = clip
clip_model_load: - kv 1: clip.has_text_encoder bool = false
clip_model_load: - kv 2: clip.has_vision_encoder bool = true
Looks like the logs that begins with loaded meta data with .... are not within a verbosity flag in the parent repo in llama.cpp: https://github.com/ggerganov/llama.cpp/blob/fbe7dfa53caff0a7e830b676e6e949917a5c71b4/examples/llava/clip.cpp#L771
So it matches your observation that no matter what verbosity we set, those logs will still show up.
As of my tests today, setting up verbose=False on the chathandler resolves this. It does need to be explicitly set though.