FastChat
FastChat copied to clipboard
Llama 3.1 - Wrong context length reported in `/token_check` endpoint
trafficstars
When the Llama 3.1 70B model is loaded in FastChat, the /token_check endpoint reports a context length of 1M instead of the expected 128K.
{
"prompts": [
{
"fits": true,
"tokenCount": 2,
"contextLength": 1048576
}
]
}