web-llm [Tracking] WebLLM: OpenAI-Compatible APIs in ChatModule

[Tracking] WebLLM: OpenAI-Compatible APIs in ChatModule

Open CharlieFRuan opened this issue 1 year ago • 1 comments

trafficstars

Overview

The goal of this task is to implement APIs that are OpenAI API compatible. Existing APIs like generate() will still be kept. Essentially we want JSON-in and JSON-out, resulting in a UI like:

import * as webllm from "@mlc-ai/web-llm";

async function main() {
  const chat = new webllm.ChatModule();
	await chat.reload("Llama-2-7b-chat-hf-q4f32_1");

  const completion = await chat.chat_completion({
    messages: [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Hello!" }
    ],
    // optional generative configs here
  });

  console.log(completion.choices[0]);
}

main();

If streaming:

  const completion = await chat.chat_completion({
    messages: [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Hello!" }
    ],
    stream = true,
    // optional generative configs here
  });

  for await (const chunk of completion) {
    console.log(chunk.choices[0].delta.content);
  }

Action items

[x] O1: Implement the basic chat_completion() (both streaming and non-streaming), support configs/features that we currently do not have inside llm_chat.ts
- https://github.com/mlc-ai/web-llm/pull/298
- https://github.com/mlc-ai/web-llm/pull/300
[ ] O2: Support function calling (tools)
[ ] O3: Documentation and tests for the WebLLM repo

Existing gaps

There are some fields/features that are not yet supported in WebLLM compared to OpenAI's openai-node.

Fields in `ChatCompletionRequest`

model: in WebLLM, we need to call reload(model) instead of making it an argument in ChatCompletionRequest
response_format (json-formatting)
function calling related:
- tool_choice
- tools

Fields in `ChatCompletion` respond

system_fingerprint: not applicable in our case (OpenAI needs it because they perform request remotely on servers)

Others

We do not support n > 1 when streaming, since llm_chat.ts does not support maintaining multiple sequences. We have to finish one sequence and then start generating another, conflicting with the goal of streaming in chunks.

Future Items

Support chat completion with image inputs (e.g. LLaVA), with Gradio frontend
Add support for low-level APIs for post-forward logit processing
- Supported here: https://github.com/mlc-ai/web-llm/pull/277
Support embedding models
More modalities such as Audio

Jan 28 '24 22:01 CharlieFRuan

@CharlieFRuan Thanks for creating the tracking issue. Just wanted to let you know that @shreygupta2809 and I are currently working on supporting the function calling

Feb 02 '24 21:02 Kartik14

web-llm web-llm copied to clipboard

[Tracking] WebLLM: OpenAI-Compatible APIs in ChatModule

Overview

Action items

Existing gaps

Fields in ChatCompletionRequest

Fields in ChatCompletion respond

Others

Future Items

web-llm
web-llm copied to clipboard

Fields in `ChatCompletionRequest`

Fields in `ChatCompletion` respond