autogen icon indicating copy to clipboard operation
autogen copied to clipboard

DBRX (Databricks LLM) example notebook

Open tj-cycyota opened this issue 1 year ago • 15 comments

Why are these changes needed?

This PR adds a single notebook demonstrating how to use Autogen with DBRX via Databricks-hosted foundation model API. It illustrates three cases of how DBRX can be used for Assistant and Conversable Agents, as well as how to persist chat logs to a Delta table.

Related issue number

(N/A - new example case)

Checks

  • [N/A] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
  • [N/A ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • [N/A] I've made sure all auto checks have passed.

tj-cycyota avatar Apr 18 '24 15:04 tj-cycyota

@microsoft-github-policy-service agree

tj-cycyota avatar Apr 18 '24 15:04 tj-cycyota

LGTM. Thanks for this awesome PR!

I have only 2 optional comments:

  1. Can we execute the notebook so that users can see the output of the code.
  2. @sonichi Do you think we should put this notebook into the "website/docs/topics/non-openai-models/cloud-databricks.ipynb"?

Good idea. Before merging, the code format error needs to be fixed: https://github.com/microsoft/autogen/actions/runs/8740674362/job/23985024316?pr=2434 You can use pre-commit to help.

sonichi avatar Apr 18 '24 18:04 sonichi

An easy way to do it is run the following command in the terminal.

git add .
pre-commit run --all-files
git add .

BeibinLi avatar Apr 18 '24 19:04 BeibinLi

⚠️ GitGuardian has uncovered 1 secret following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secret in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
10404662 Triggered Generic CLI Secret 0aeb4ed6d0a2ba936b124aa43b6532befd12443f .github/workflows/dotnet-release.yml View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secret safely. Learn here the best practices.
  3. Revoke and rotate this secret.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

gitguardian[bot] avatar Apr 18 '24 19:04 gitguardian[bot]

Thanks for being willing to accept this contribution! Added cell outputs to the ipynb.

Can somebody help me understand the failing test? Only thing I see is:

notebook/agentchat_databricks_dbrx.ipynb:cell 4:12:34: F821 Undefined name `dbutils`
notebook/agentchat_databricks_dbrx.ipynb:cell 17:33:11: F821 Undefined name `spark`

which are valid commands in Databricks (e.g. methods are auto-imported).

I see a security check failure as well, but thats not the file I committed.

tj-cycyota avatar Apr 18 '24 22:04 tj-cycyota

@tj-cycyota Don't worry about the security check failure. It was from updates from other commits, which we are trying to resolve now.

BeibinLi avatar Apr 19 '24 20:04 BeibinLi

@ekzhu latest changes have the following:

  • Set chat_result to a variable and reduced printouts
  • Added metadata entry to ipynb
  • Added author details
  • Added databricks.md and image to docs section.
  • Commented out ruff tests

Some pre-merge tests are failing, but its not clear to me why. Any ideas here?

tj-cycyota avatar Apr 25 '24 23:04 tj-cycyota

@tj-cycyota I just merged main to this branch, please pull before commit. Just a few things:

  1. Fix the ruff setting in pre commit config to properly exclude the notebook.
  2. Run pre-commit locally to fix formatting error in CI. https://microsoft.github.io/autogen/docs/contributor-guide/pre-commit

ekzhu avatar May 03 '24 00:05 ekzhu

@tj-cycyota fix the formatting for you: https://github.com/tj-cycyota/autogen/pull/4

@jackgerrits It looks like the notebook was created using databricks, not Jupyter Notebook. So this code cell

https://github.com/tj-cycyota/autogen/blob/628a1d6995067eb16717b2e48f0096539612b83c/notebook/agentchat_databricks_dbrx.ipynb#L47

after converted to MDX causes this error:

https://github.com/microsoft/autogen/actions/runs/8973428444/job/24643573392?pr=2434#step:9:51

ekzhu avatar May 08 '24 00:05 ekzhu

I'd recommend inspecting the generated MDX to see what the issue is. It could be many things and it's just way easier to inspect the MDX and go through a few iterations of trying to fix and checking the mdx locally.

jackgerrits avatar May 08 '24 11:05 jackgerrits

I'd recommend inspecting the generated MDX to see what the issue is. It could be many things and it's just way easier to inspect the MDX and go through a few iterations of trying to fix and checking the mdx locally.

@jackgerrits would you mind please providing slightly more direction to this issue? Would love to contribute to autogen and happy to take feedback (as the PR history shows), but this is quite a nebulous and non-informative error: image

tj-cycyota avatar May 08 '24 17:05 tj-cycyota

No worries, install Quarto in the way mentioned on this page

Then, convert your notebook to MDX using the command quarto render <notebook>. This should produce a MDX file. Now you have the MDX there are a couple of options.

  • Inspect the line from the error you pasted to see if anything seems obvious
  • Open up the MDX file in vscode and install the mdx extension and it might simply highlight the issue
  • Paste the MDX file here and I can eye ball it to see if anything is obvious

jackgerrits avatar May 08 '24 18:05 jackgerrits

@jackgerrits this produces an .html, attached. Can you spot anything obviously wrong with the format? Perhaps its some of the extra metadata in the ipynb formatting? agentchat_databricks_dbrx.zip

tj-cycyota avatar May 08 '24 22:05 tj-cycyota

I found these blocks in the generated MDX files that are causing issue:

::: {.cell
application/vnd.databricks.v1+cell=‘{“cellMetadata”:{},“inputWidgets”:{},“nuid”:“d5eb9c93-2dd5-4432-b0a5-66da9dafee6d”,“showTitle”:false,“title”:““}’}

And this one:

<div dangerouslySetInnerHTML={{ __html: quartoRawHtml[0] }} />

| id  | invocation_id                        | client_id       | wrapper_id      | session_id                           | request                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | response                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | is_cached | cost | start_time                 | end_time                   |
| --- | ------------------------------------ | --------------- | --------------- | ------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------- | ---- | -------------------------- | -------------------------- |
| 3   | cbff4e98-de75-40b5-9716-ffe2cd0d0b87 | 139937846539024 | 139937846955264 | 6c389f5f-3619-4762-8118-bc98dd414f90 | {"messages": \[{"content": "You are a helpful AI assistant.\nSolve tasks using your coding and language skills.\nIn the following cases, suggest python code (in a python coding block) or shell script (in a sh coding block) for the user to execute.\n 1. When you need to collect info, use the code to output the info you need, for example, browse or search the web, download/read a file, print the content of a webpage or a file, get the current date/time, check the operating system. After sufficient info is printed and the task is ready to be solved based on your language skill, you can solve the task by yourself.\n 2. When you need to perform some task with code, use the code to perform the task and output the result. Finish the task smartly.\nSolve the task step by step if you need to. If a plan is not provided, explain your plan first. Be clear which step uses code, and which step uses your language skill.\nWhen using code, you must indicate the script type in the code block. The user cannot provide any other feedback or perform any other action beyond executing the code you suggest. The user can't modify your code. So do not suggest incomplete code which requires users to modify. Don't use a code block if it's not intended to be executed by the user.\nIf you want the user to save the code in a file before executing it, put \# filename: <filename> inside the code block as the first line. Don't include multiple code blocks in one response. Do not ask users to copy and paste the result. Instead, use 'print' function for the output when relevant. Check the execution result returned by the user.\nIf the result indicates there is an error, fix the error and output the code again. Suggest the full code instead of partial code or code changes. If the error can't be fixed or if the task is not solved even after the code is executed successfully, analyze the problem, revisit your assumption, collect additional info you need, and think of a different approach to try.\nWhen you find an answer, verify the answer carefully. Include verifiable evidence in your response if possible.\nReply \\TERMINATE\\ in the end when everything is done.\n ", "role": "system"}, {"content": "What is MLflow?", "role": "user"}\], "model": "databricks-dbrx-instruct"} | { "id": "59b5e537-e14a-4afd-9b1f-d046f5372af7", "choices": \[ { "finish_reason": "stop", "index": 0, "logprobs": null, "message": { "content": "Sure, I'd be happy to explain MLflow to you. MLflow is an open-source platform for managing machine learning workflows. It was developed by Databricks and was open-sourced in 2018. MLflow provides a number of features to help data scientists and machine learning engineers manage the end-to-end machine learning lifecycle, including:\n\n1. \*\*MLflow Tracking\*\*: This is a logging API that allows you to record and query experiments, including code, data, config, and results.\n2. \*\*MLflow Projects\*\*: This is a format for packaging reusable and reproducible data science code, which can be run on different platforms.\n3. \*\*MLflow Models\*\*: This is a convention for packaging machine learning models in multiple formats, making it easy to deploy in different environments.\n4. \*\*MLflow Model Registry\*\*: This is a central repository to store, manage, and serve machine learning models.\n\nHere is a Python code example of how you might use MLflow Tracking to log a simple experiment:\n\`\`\`python\n# filename: mlflow_example.py\n\nimport mlflow\nimport numpy as np\n\n# Log a parameter (e.g., number of trees in a random forest)\nmlflow.log_param(\\num_trees\\, 100)\n\n# Log a metric (e.g., accuracy of a model)\naccuracy = np.random.rand()\nmlflow.log_metric(\\accuracy\\, accuracy)\n\n# Log the model\nmlflow.sklearn.log_model(model, \\model\\)\n\n# End the run\nmlflow.end_run()\n\`\`\`\nTo run this code, you would need to have MLflow installed and running on your machine. You can install MLflow using pip:\n\`\`\`\npip install mlflow\n\`\`\`\nThen, you can run the code using the following command:\n\`\`\`\npython mlflow_example.py\n\`\`\`\nThis will create a new experiment in MLflow and log the parameters, metrics, and model. You can then view the experiment in the MLflow UI.\n\nI hope this helps! Let me know if you have any other questions.", "role": "assistant", "function_call": null, "tool_calls": null } } \], "created": 1713446636, "model": "dbrx-instruct-032724", "object": "chat.completion", "system_fingerprint": null, "usage": { "completion_tokens": 409, "prompt_tokens": 478, "total_tokens": 887 }, "cost": 0, "message_retrieval_function": "def message_retrieval(\n self, response: Union\[ChatCompletion, Completion\]\n ) -\> Union\[List\[str\], List\[ChatCompletionMessage\]\]:\n \\\\\\Retrieve the messages from the response.\\\\\\\n choices = response.choices\n if isinstance(response, Completion):\n return \[choice.text for choice in choices\] \# type: ignore \[union-attr\]\n\n if TOOL_ENABLED:\n return \[ \# type: ignore \[return-value\]\n (\n choice.message \# type: ignore \[union-attr\]\n if choice.message.function_call is not None or choice.message.tool_calls is not None \# type: ignore \[union-attr\]\n else choice.message.content\n ) \# type: ignore \[union-attr\]\n for choice in choices\n \]\n else:\n return \[ \# type: ignore \[return-value\]\n choice.message if choice.message.function_call is not None else choice.message.content \# type: ignore \[union-attr\]\n for choice in choices\n \]" } | 1         | 0.0  | 2024-04-25 12:48:25.468565 | 2024-04-25 12:48:25.469843 |

<div dangerouslySetInnerHTML={{ __html: quartoRawHtml[1] }} />

And others similar to the ones above.

After removing those blocks the website builds properly.

So it could be { and } are not properly escaped? @jackgerrits

ekzhu avatar May 09 '24 17:05 ekzhu

Is it possible to adapt the parser to be able to support Databricks .ipynb metadata? I'd prefer not to manually remove metadata from every notebook cell.

It looks like Quarto has issue with the display() command, so I can change that out. I'm also OK to not have this notebook rendered on the website if there's a contributor option for that.

tj-cycyota avatar May 13 '24 18:05 tj-cycyota

I am not familiar with what makes a databricks notebook unique and different. If you'd like to just skip it you can see how to here https://github.com/microsoft/autogen/blob/main/notebook/contributing.md#metadata-fields

jackgerrits avatar May 17 '24 21:05 jackgerrits

@ekzhu docs are passing now after scrubbing formatting. Let me know if any other changes are needed before merge?

tj-cycyota avatar May 17 '24 23:05 tj-cycyota

@ekzhu docs are passing now after scrubbing formatting. Let me know if any other changes are needed before merge?

@tj-cycyota thanks much! The only thing left to fix the formatting error by running pre-commit run --all-files locally and commit the formatting fixes.

ekzhu avatar May 21 '24 23:05 ekzhu

@ekzhu thanks for the guidance, all tests are passing now!

tj-cycyota avatar May 24 '24 12:05 tj-cycyota