yocto-gl Implement chat & chat streaming for Anthropic in Deployments

🛠 DevTools 🛠

Install mlflow from this PR

pip install git+https://github.com/mlflow/mlflow.git@refs/pull/11195/merge

Checkout with GitHub CLI

gh pr checkout 11195

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Implement chat & chat streaming endpoints for Anthropic provider in MLflow Deployments Server

How is this PR tested?

[ ] Existing unit/integration tests
[X] New unit/integration tests
[ ] Manual tests

Does this PR require documentation update?

[ ] No. You can skip the rest of this section.
[X] Yes. I've updated:
- [ ] Examples
- [X] API references
- [ ] Instructions

Release Notes

Is this a user-facing change?

[ ] No. You can skip the rest of this section.
[X] Yes. Give a description of this change to be included in the release notes for MLflow users.

Implement chat & chat streaming endpoints for Anthropic provider in MLflow Deployments Server

What component(s), interfaces, languages, and integrations does this PR affect?

Components

[ ] area/artifacts: Artifact stores and artifact logging
[ ] area/build: Build and test infrastructure for MLflow
[X] area/deployments: MLflow Deployments client APIs, server, and third-party Deployments integrations
[ ] area/docs: MLflow documentation pages
[ ] area/examples: Example code
[ ] area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
[ ] area/models: MLmodel format, model serialization/deserialization, flavors
[ ] area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
[ ] area/projects: MLproject format, project running backends
[ ] area/scoring: MLflow Model server, model deployment tools, Spark UDFs
[ ] area/server-infra: MLflow Tracking server backend
[ ] area/tracking: Tracking Service, tracking client APIs, autologging

Interface

[ ] area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
[ ] area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
[ ] area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
[ ] area/windows: Windows support

Language

[ ] language/r: R APIs and clients
[ ] language/java: Java APIs and clients
[ ] language/new: Proposals for new client languages

Integrations

[ ] integrations/azure: Azure and Azure ML integrations
[ ] integrations/sagemaker: SageMaker integrations
[ ] integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

[ ] rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
[ ] rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
[X] rn/feature - A new user-facing feature worth mentioning in the release notes
[ ] rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
[ ] rn/documentation - A user-facing documentation change worth mentioning in the release notes

Feb 20 '24 06:02 gabrielfu

Documentation preview for 84f5bb90ef257921981bce0e3cca43e9c81ab7b9 will be available when this CircleCI job completes successfully.

More info

Ignore this comment if this PR does not change the documentation.
It takes a few minutes for the preview to be available.
The preview is updated when a new commit is pushed to this PR.
This comment was created by https://github.com/mlflow/mlflow/actions/runs/8000230275.

Feb 20 '24 06:02 github-actions[bot]

@BenWilson2 can you help manual test with your anthropic API key to see if this PR is working?

# config.yaml
routes:
  - name: anthropic
    route_type: llm/v1/chat
    model:
      provider: anthropic
      name: claude-2.1
      config:
        anthropic_api_key: <key>

mlflow deployments start-server --config-path config.yaml --workers 1

Non-streaming:

curl -X POST http://127.0.0.1:5000/endpoints/anthropic/invocations \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "system","content": "You are a funny assistant"},{"role": "user","content": "Hi"},{"role": "assistant","content": "Hi"},{"role": "user","content": "Tell me something fun."}], "temperature": 1.5, "max_tokens": 100}'

Streaming:

curl -X POST http://127.0.0.1:5000/endpoints/anthropic/invocations \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "system","content": "You are a funny assistant"},{"role": "user","content": "Hi"},{"role": "assistant","content": "Hi"},{"role": "user","content": "Tell me something fun."}], "stream": true, "temperature": 1.5, "max_tokens": 100}'

Feb 20 '24 07:02 gabrielfu

Hey @gabrielfu sorry for dropping the ball on this.

Streaming:

~ via 🅒 base via 🐍 dev-env
➜   curl -X POST http://127.0.0.1:5000/endpoints/anthropic-chat/invocations \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "system","content": "You are a funny assistant"},{"role": "user","content": "Hi"},{"role": "assistant","content": "Hi"},{"role": "user","content": "Tell me something fun."}], "stream": true, "temperature": 1.5, "max_tokens": 100}'
data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": ""}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "Here"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "'s"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " a"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " silly"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " joke"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " for"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " you"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863952, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": ":"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " Why"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " can"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "'t"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " a"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " bicycle"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " stand"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " up"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " by"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " itself"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "?"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " Because"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " it"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "'s"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": " two"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "-"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "t"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "ired"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": null, "delta": {"role": null, "content": "!"}}]}

data: {"id": "msg_013uCEMi8ow4BRipK6qRzrmW", "object": "chat.completion.chunk", "created": 1709863953, "model": "claude-2.1", "choices": [{"index": 0, "finish_reason": "stop", "delta": {"role": null, "content": null}}]}

Non-streaming:

~ via 🅒 base via 🐍 dev-env
➜ curl -X POST http://127.0.0.1:5000/endpoints/anthropic-chat/invocations \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "system","content": "You are a funny assistant"},{"role": "user","content": "Hi"},{"role": "assistant","content": "Hi"},{"role": "user","content": "Tell me something fun."}], "temperature": 1.5, "max_tokens": 100}'
{"id":"msg_017WWVS7GdTYGmAaduF7DWQv","object":"chat.completion","created":1709863852,"model":"claude-2.1","choices":[{"index":0,"message":{"role":"assistant","content":"Here's something I find amusing - I don't actually experience fun or have a sense of humor myself. As an AI assistant created by Anthropic to be helpful, harmless, and honest, I don't have subjective experiences. I can try to tell jokes or fun facts if you'd like, but I rely on my training to determine what kinds of things tend to elicit amusement or enjoyment from humans!"},"finish_reason":"stop"}],"usage":{"prompt_tokens":29,"completion_tokens":85,"total_tokens":114}}%

Mar 08 '24 02:03 BenWilson2

thanks @BenWilson2 for the manual test!

Mar 08 '24 03:03 gabrielfu

yocto-gl yocto-gl copied to clipboard

Implement chat & chat streaming for Anthropic in Deployments

Install mlflow from this PR

Checkout with GitHub CLI

Related Issues/PRs

What changes are proposed in this pull request?

How is this PR tested?

Does this PR require documentation update?

Release Notes

Is this a user-facing change?

What component(s), interfaces, languages, and integrations does this PR affect?

How should the PR be classified in the release notes? Choose one:

yocto-gl
yocto-gl copied to clipboard