web-llm icon indicating copy to clipboard operation
web-llm copied to clipboard

[Tracking] WebLLM: OpenAI-Compatible APIs in ChatModule

Open CharlieFRuan opened this issue 1 year ago • 1 comments
trafficstars

Overview

The goal of this task is to implement APIs that are OpenAI API compatible. Existing APIs like generate() will still be kept. Essentially we want JSON-in and JSON-out, resulting in a UI like:

import * as webllm from "@mlc-ai/web-llm";

async function main() {
  const chat = new webllm.ChatModule();
	await chat.reload("Llama-2-7b-chat-hf-q4f32_1");

  const completion = await chat.chat_completion({
    messages: [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Hello!" }
    ],
    // optional generative configs here
  });

  console.log(completion.choices[0]);
}

main();

If streaming:

  const completion = await chat.chat_completion({
    messages: [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "Hello!" }
    ],
    stream = true,
    // optional generative configs here
  });

  for await (const chunk of completion) {
    console.log(chunk.choices[0].delta.content);
  }

Action items

  • [x] O1: Implement the basic chat_completion() (both streaming and non-streaming), support configs/features that we currently do not have inside llm_chat.ts
    • https://github.com/mlc-ai/web-llm/pull/298
    • https://github.com/mlc-ai/web-llm/pull/300
  • [ ] O2: Support function calling (tools)
  • [ ] O3: Documentation and tests for the WebLLM repo

Existing gaps

There are some fields/features that are not yet supported in WebLLM compared to OpenAI's openai-node.

Fields in ChatCompletionRequest

  • model: in WebLLM, we need to call reload(model) instead of making it an argument in ChatCompletionRequest
  • response_format (json-formatting)
  • function calling related:
    • tool_choice
    • tools

Fields in ChatCompletion respond

  • system_fingerprint: not applicable in our case (OpenAI needs it because they perform request remotely on servers)

Others

  • We do not support n > 1 when streaming, since llm_chat.ts does not support maintaining multiple sequences. We have to finish one sequence and then start generating another, conflicting with the goal of streaming in chunks.

Future Items

  • Support chat completion with image inputs (e.g. LLaVA), with Gradio frontend
  • Add support for low-level APIs for post-forward logit processing
    • Supported here: https://github.com/mlc-ai/web-llm/pull/277
  • Support embedding models
  • More modalities such as Audio

CharlieFRuan avatar Jan 28 '24 22:01 CharlieFRuan

@CharlieFRuan Thanks for creating the tracking issue. Just wanted to let you know that @shreygupta2809 and I are currently working on supporting the function calling

Kartik14 avatar Feb 02 '24 21:02 Kartik14