mlc-llm icon indicating copy to clipboard operation
mlc-llm copied to clipboard

[Question] mlc-llm server cannot return correct logprobs

Open kunxiongzhu opened this issue 8 months ago • 12 comments

❓ General Questions

Steps to reproduce the behavior:

mlc_llm serve --model-lib /mnt/data/ehdd1/home/models/mlc/libs/Llama-2-7b-chat-hf-q0f16-O0-cuda.so /mnt/data/ehdd1/home/models/mlc/Llama-2-7b-chat-hf-q0f16-MLC/

python test.py

test.py as following:

import requests
import json

MLC_SERVER_URL = "http://127.0.0.1:8000/v1/completions"

request_payload = {
    "model": "/mnt/data/ehdd1/home/models/mlc/Llama-2-7b-chat-hf-q0f16-MLC/", 
    "prompt": "1 + 1 =",
    "max_tokens": 5,
    "logprobs": 5, 
    "temperature": 0,
    "top_p": 1,
}

response = requests.post(MLC_SERVER_URL, json=request_payload)

if response.status_code == 200:
    result = response.json()
    print(json.dumps(result, indent=4))  

    if "choices" in result:
        for choice in result["choices"]:
            if "logprobs" in choice and choice["logprobs"] is not None:
                print("\n Successfully retrieved logprobs:\n", json.dumps(choice["logprobs"], indent=4))
            else:
                print("\n logprobs is empty, MLC might not have computed logprobs.")
else:
    print(f"\n Request failed: {response.status_code}")
    print(response.text)

Expected behavior

I am using the above script to test the returned logprob. However, I noticed that when I run mlc_llm serve, the returned logprob is always empty, while the tokens are generated correctly. This prevents me from using the lm_eval tool to evaluate the model's performance.

Using the same script, I tested with llama-cpp-python by starting its server, and in this case, logprob was correctly generated. This allowed me to use lm_eval to evaluate the model's performance properly, proving that my script is correct and my model is not the issue. Also, the llama-cpp-python uses the Openai API, which is the same as mlc-llm.

Therefore, I am wondering if there is a bug in mlc-llm that prevents it from returning logprob correctly.

Output for llama-cpp-python: (which is also the expected result with the correct logprobs)

{
    "id": "cmpl-6fad5dd2-0c56-4cf3-80be-580028c487dc",
    "object": "text_completion",
    "created": 1739426316,
    "model": "/mnt/data/ehdd1/home/models/mlc/Llama-2-7b-chat-hf-q0f16-MLC/",
    "choices": [
        {
            "text": " 2\n\n2",
            "index": 0,
            "logprobs": {
                "text_offset": [
                    7,
                    8,
                    9,
                    10,
                    11
                ],
                "token_logprobs": [
                    -0.12357338517904282,
                    -0.09533079713582993,
                    -0.586503267288208,
                    -0.46185949444770813,
                    -1.2555230855941772
                ],
                "tokens": [
                    " ",
                    "2",
                    "\n",
                    "\n",
                    "2"
                ],
                "top_logprobs": [
                    {
                        " ": -0.12357338517904282,
                        " ?": -3.4502124786376953,
                        "  ": -3.5839529037475586,
                        "\n": -4.564411163330078,
                        " _": -5.367070198059082
                    },
                    {
                        "2": -0.09533079713582993,
                        "3": -3.1487088203430176,
                        "1": -3.6742844581604004,
                        "4": -5.479968547821045,
                        "0": -6.0776143074035645
                    },
                    {
                        "\n": -0.586503267288208,
                        ",": -2.135464906692505,
                        ".": -2.315242052078247,
                        " ": -3.1089794635772705,
                        " (": -3.291478395462036
                    },
                    {
                        "\n": -0.46185949444770813,
                        "```": -2.412043333053589,
                        " ": -3.3926141262054443,
                        "   ": -3.8115532398223877,
                        "(": -4.226720809936523
                    },
                    {
                        "2": -1.2555230855941772,
                        "This": -2.669368267059326,
                        "But": -2.814023494720459,
                        "1": -3.256460666656494,
                        "3": -3.4381399154663086
                    }
                ]
            },
            "finish_reason": "length"
        }
    ],
    "usage": {
        "prompt_tokens": 7,
        "completion_tokens": 5,
        "total_tokens": 12
    }
}

 Successfully retrieved logprobs:
 {
    "text_offset": [
        7,
        8,
        9,
        10,
        11
    ],
    "token_logprobs": [
        -0.12357338517904282,
        -0.09533079713582993,
        -0.586503267288208,
        -0.46185949444770813,
        -1.2555230855941772
    ],
    "tokens": [
        " ",
        "2",
        "\n",
        "\n",
        "2"
    ],
    "top_logprobs": [
        {
            " ": -0.12357338517904282,
            " ?": -3.4502124786376953,
            "  ": -3.5839529037475586,
            "\n": -4.564411163330078,
            " _": -5.367070198059082
        },
        {
            "2": -0.09533079713582993,
            "3": -3.1487088203430176,
            "1": -3.6742844581604004,
            "4": -5.479968547821045,
            "0": -6.0776143074035645
        },
        {
            "\n": -0.586503267288208,
            ",": -2.135464906692505,
            ".": -2.315242052078247,
            " ": -3.1089794635772705,
            " (": -3.291478395462036
        },
        {
            "\n": -0.46185949444770813,
            "```": -2.412043333053589,
            " ": -3.3926141262054443,
            "   ": -3.8115532398223877,
            "(": -4.226720809936523
        },
        {
            "2": -1.2555230855941772,
            "This": -2.669368267059326,
            "But": -2.814023494720459,
            "1": -3.256460666656494,
            "3": -3.4381399154663086
        }
    ]
}

Output for mlc-llm: (which is incorrect)

{
    "id": "cmpl-f5c2dd151fcc4a9dad8efd82d4d1c5d1",
    "choices": [
        {
            "finish_reason": "length",
            "index": 0,
            "logprobs": {
                "text_offset": null,
                "token_logprobs": [
                    0.0
                ],
                "tokens": [
                    ""
                ],
                "top_logprobs": [
                    {
                        "": -23.0259,
                        "<unk>": -23.0259,
                        "<s>": -23.0259,
                        "</s>": -23.0259
                    }
                ]
            },
            "text": "2.\n\n"
        }
    ],
    "created": 1739426790,
    "model": "/mnt/data/ehdd1/home/models/mlc/Llama-2-7b-chat-hf-q0f16-MLC/",
    "object": "text_completion",
    "usage": {
        "prompt_tokens": 6,
        "completion_tokens": 5,
        "total_tokens": 11,
        "extra": null
    }
}

 Successfully retrieved logprobs:
 {
    "text_offset": null,
    "token_logprobs": [
        0.0
    ],
    "tokens": [
        ""
    ],
    "top_logprobs": [
        {
            "": -23.0259,
            "<unk>": -23.0259,
            "<s>": -23.0259,
            "</s>": -23.0259
        }
    ]
}

Environment

Platform: CUDA Operating system: Ubuntu Device: Nvidia A6000 How you installed MLC-LLM (source)

kunxiongzhu avatar Feb 19 '25 15:02 kunxiongzhu