Prerequisites

Please answer the following questions for yourself before submitting an issue.

[X] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
[X] I carefully followed the README.md.
[X] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
[X] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

verbose=false passed to Llama should disable log messages for llama_cpp

Current Behavior

Log messages are leaking through from an underlying llama_chat_format.

clip_model_load: loaded meta data with 18 key-value pairs and 377 tensors from models/llava/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv   0:                       general.architecture str              = clip
...

Environment and Context

M1 Pro - MacBook Pro

$ python3 --version
Python 3.10.13
$ make --version
GNU Make 3.81
$ g++ --version
Apple clang version 15.0.0 (clang-1500.1.0.2.5)

Failure Information (for bugs)

I believe verbose should suppress these log messages.

Steps to Reproduce

Use the following code:

from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava15ChatHandler

import logging
logger = logging.getLogger('llama_cpp.llama_chat_format')
logger.disabled = True


def load_llm():
    chat_handler = Llava15ChatHandler(clip_model_path="./models/llava/mmproj-model-f16.gguf")

    llm = Llama(
        model_path="./models/llava/ggml-model-q5_k.gguf",
        chat_handler=chat_handler,
        verbose=False,
        n_ctx=1024,       # n_ctx should be increased to accommodate the image embedding
        logits_all=True,  # needed to make llava work
    )
    return llm

def run(input, llm):
    response = llm.create_chat_completion(
        messages=[
            {"role": "system", "content": "You are an assistant who perfectly describes images."},
            {
                "role": "user",
                "content": [
                    input,
                    {
                        "type": "text",
                        "text": "Describe this image in detail please."
                    }
                ]
            }
        ]
    )

    return response['choices'][0]['message']['content']


if __name__=='__main__':
    input = {
        "type": "image_url",
        "image_url": {"url": "https://thumbor.forbes.com/thumbor/fit-in/900x510/https://www.forbes.com/advisor/wp-content/uploads/2023/07/top-20-small-dog-breeds.jpeg.jpg"}
    }

    llm = load_llm()
    response = run(input, llm).strip()

    print(response)

Jan 27 '24 07:01 justinrmiller

Maybe it's due to Llava15ChatHandler's __init__ method calling with suppress_stdout_stderr(disable=self.verbose). With verbose=False it also means that disable=False. @abetlen I could open a PR if that helps.

Jan 28 '24 19:01 isaac-chung

Maybe it's due to Llava15ChatHandler's __init__ method calling with suppress_stdout_stderr(disable=self.verbose).

With verbose=False it also means that disable=False. @abetlen I could open a PR if that helps.

I thought so too but I took another glance at this just now and I think the way this is set up is confusing.

disable = verbose = True -> disable the suppression, make the output more verbose disable = verbose = False -> enable the suppression, make the output less verbose

It's sort of a double negative.

    # Oddly enough this works better than the contextlib version
    def __enter__(self):
        if self.disable:
            return self
        ...

Jan 28 '24 20:01 justinrmiller

I wasn't explicitly setting verbose previously in the Llava15ChatHandler as the default was verbose = false, so I decided to try varying it and it seems regardless of what self.verbose is set to the log lines are printed, just at different times.

I patched it locally to include log messages for the verbose setting just to be sure:

If verbose is set to true:

__init__ verbose: True
clip_model_load: loaded meta data with 18 key-value pairs and 377 tensors from ./models/llava/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv   0:                       general.architecture str              = clip
clip_model_load: - kv   1:                      clip.has_text_encoder bool             = false
...
__init__: completed

If verbose is set to false:

__init__ verbose: False
__init__: completed
Generated: LLAVAResult(id=0, image_url='https://justinrmiller.github.io/assets/photo-gallery/eddie.jpg', generated_text='The image features a brown dog lying on the floor, resting its head on a pillow or blanket. The dog appears to be relaxed and comfortable as it lays down. There are several books scattered around the room, with some placed near the top left corner of the scene and others closer to the bottom right side. Additionally, there is a chair located in the upper right part of the image.', generation_time=19.80641816696152)
__del__ verbose: False
__del__: completed
clip_model_load: loaded meta data with 18 key-value pairs and 377 tensors from ./models/llava/mmproj-model-f16.gguf
clip_model_load: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
clip_model_load: - kv   0:                       general.architecture str              = clip
clip_model_load: - kv   1:                      clip.has_text_encoder bool             = false
clip_model_load: - kv   2:                    clip.has_vision_encoder bool             = true

Jan 28 '24 21:01 justinrmiller

Looks like the logs that begins with loaded meta data with .... are not within a verbosity flag in the parent repo in llama.cpp: https://github.com/ggerganov/llama.cpp/blob/fbe7dfa53caff0a7e830b676e6e949917a5c71b4/examples/llava/clip.cpp#L771

So it matches your observation that no matter what verbosity we set, those logs will still show up.

Jan 29 '24 09:01 isaac-chung

As of my tests today, setting up verbose=False on the chathandler resolves this. It does need to be explicitly set though.

Sep 19 '24 13:09 cesarandreslopez

llama-cpp-python
llama-cpp-python copied to clipboard

Unable to disable "clip_model_load" log messages

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

llama-cpp-python llama-cpp-python copied to clipboard

Unable to disable "clip_model_load" log messages

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

llama-cpp-python
llama-cpp-python copied to clipboard