FastChat Llama 3.1 - Wrong context length reported in `/token

Llama 3.1 - Wrong context length reported in `/token_check` endpoint

Open PyroGenesis opened this issue 1 year ago • 0 comments

trafficstars

When the Llama 3.1 70B model is loaded in FastChat, the /token_check endpoint reports a context length of 1M instead of the expected 128K.

{
    "prompts": [
        {
            "fits": true,
            "tokenCount": 2,
            "contextLength": 1048576
        }
    ]
}

Jul 25 '24 16:07 PyroGenesis