text-generation-inference FinishReason enum not compatible with OAI api

System Info

mac, stable rust, master version of TGI

Reproduction

Use the async-openai crate to make a call that ends on the server with eos_token (which doesn't exist in OAI API), you'll get

JSONDeserialize(Error("unknown variant eos_token, expected one of stop, length, tool_calls, content_filter, function_call", line: 1, column: 209))

Expected behavior

serialization should succeed with strict clients

Jul 24 '24 13:07 jondot

Hi @jondot this appears to be an issue with the async-openai. The error thrown is JSONDeserialize which is defined in the client library.

Additionally for reference, I just tested the library locally and was able to get a response without any issues (not including the eos_token, as it doesn't seem like the library allows you to see that value)

use async_openai::config::OpenAIConfig;
use async_openai::types::CreateCompletionRequestArgs;
use async_openai::Client;
use std::error::Error;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let client = Client::with_config(OpenAIConfig::new().with_api_base("http://localhost:3000/v1"));

    let request = CreateCompletionRequestArgs::default()
        .model("meta-llama/Meta-Llama-3-8B-Instruct")
        .prompt("What is Hugging Face and what does it do?")
        .max_tokens(40_u32)
        .build()?;

    let response = client.completions().create(request).await?;

    println!("\nResponse (single):\n");
    for choice in response.choices {
        println!("{}", choice.text);
    }

    Ok(())
}

Response (single):

 🤗
Hugging Face is an AI technology company that specializes in natural language processing (NLP) and artificial intelligence (AI) research. The company has developed a range of AI models and

Jul 25 '24 15:07 drbh

I understand that, yes. But look at this differently, async-openai is strongly typed, and the serde infra is strict. This mean this is the perfect detector for you to find out what's not adhering to the OAI spec. If you'd use some python or node.js lib, it is more than likely you'd never bump into this, until some kind of user-side logic would try to make sense of the return values.

Now, if you're not completely compatible with the strict OAI API, then that's a different story. But if you are you should be either:

Fully strongly type-compatible
Have a wider-area API, of which OAI API is a non-breaking subset

In this case, missing values in the original OAI API is a breaking non-subset API surface area.

Jul 25 '24 15:07 jondot

thanks for the quick response @jondot, I apologize but I'm not sure I fully understand the issue/can reproduce.

Would you be able to share an example of a request you're making that is failing to be parsed by async-openai. Thank you!

update

I've been able to reproduce and my understanding of the issue is that the returned finish_reason is the value "eos_token" rather than "stop"

for reference

use async_openai::config::OpenAIConfig;
use async_openai::types::CreateCompletionRequestArgs;
use async_openai::Client;
use std::error::Error;

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let client = Client::with_config(OpenAIConfig::new().with_api_base("http://localhost:3000/v1"));

    let request = CreateCompletionRequestArgs::default()
        .model("meta-llama/Meta-Llama-3-8B-Instruct")
        .prompt("What is Hugging Face and what does it do?")
        .max_tokens(2000_u32) // <- increased
        .seed(1337)
        .build()?;

    let response = client.completions().create(request).await?;

    println!("\nResponse (single):\n");
    for choice in response.choices {
        println!("{}", choice.text);
    }

    Ok(())
}

will follow up soon with changes to align the response with the openai expected values. Thanks for noting this issue

Jul 25 '24 16:07 drbh