Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Basic implementation of an plugin system for OA

Open draganjovanovich opened this issue 1 year ago • 14 comments

the plugins

Hi, this is my first PR here, but I was somewhat active on other fronts of OA development.

This pr will bring some basic plugin functionality to the Open Assistant, and as discussed with @yk @andreaskoepf there are quite a few directions for something like this to be integrated with OA, but this should serve a purpose as some initial proof-of-concept and exploratory feature for 3rd party integrations with OA.

I also included a small calculator plugin as a possible candidate for the OA internal plugin, which would be like the default one, for people to try out and also as an example, of how one could implement own plugins.

If we decide to include this plugin, there should be added a deployment/hosting mechanism for it.

I will push a separate branch in the next couple of days, that will serve as an alternative to the approach, so we can A/B test it along with the new models (SFT-s/RLHF-s)

I also tried to comment on every weird quirk or decision in code, so one could easily understand, and change it, but there are quite a few places, where a simple new line char or like " char in specific strings, could affect LLM performance in the plugin usage scenario.

Will also try to push some documentation regarding plugin development, but there are already some useful comments in calculator plugin on what should be paid attention to.

Here are some of the current UI changes introduced with this PR.

Plugin chooser component Screenshot 2023-04-20 at 00 55 38
Plugin execution details component Screenshot 2023-04-19 at 21 40 38 Screenshot 2023-04-19 at 21 40 56 Screenshot 2023-04-19 at 21 30 18
Plugin assisted answer Screenshot 2023-04-19 at 21 29 52 Screenshot 2023-04-21 at 18 28 45
Verified plugin usage UI look Screenshot 2023-04-20 at 15 08 36
Some plugin usage examples Screenshot 2023-04-18 at 01 57 33 Screenshot 2023-04-17 at 23 17 35
Mixed usage example where model chooses not to use plugin on its own Screenshot 2023-04-20 at 21 31 46

draganjovanovich avatar Apr 19 '23 23:04 draganjovanovich

So cool! if possible can you add some docstrings to some of the important functions to make it easier to try just jump in and understand them? I'm not 100% sure if we have any standards etc around this yet in the repo - but was thinking if thinking about some docs maybe starting there might be least painful for now potentially as such a big PR.

andrewm4894 avatar Apr 20 '23 12:04 andrewm4894

What would be a good way to try help test this out?

Would it be feasible for me to try this "locally" in some way?

andrewm4894 avatar Apr 20 '23 19:04 andrewm4894

I can tell you how I ran this "locally" if it helps.

I used docker for the complete OA stack on my local server like: docker compose --profile frontend-dev --profile inference up --build --attach-dependencies and the inference server was run on a remote server. (4XA100 80gb) sharded with https://github.com/huggingface/text-generation-inference You can than point worker to that inference server with editing of worker_full_main.sh, but I am not sure that is the easiest way, however, that is how I did it. You can easily fit it in single A100 or even weaker gpu with half precision, but if you want to do prompt engineering which requires multiple re-runs of generations over and over again, you will give up soon, because of slowness... at least i did lol.

As for requirements for the model, there are some details in readme.md files inside of inference and worker folders.

draganjovanovich avatar Apr 20 '23 19:04 draganjovanovich

I used docker for the complete OA stack on my local server like: docker compose --profile frontend-dev --profile inference up --build --attach-dependencies

Strangely I keep encountering

Error response from daemon: network 102b7c8cf4e45cd48696f81aaef73b33e88df2d979cdad8ca372c911c9753b7b not found

with the frontend. Can you also elaborate more on how to run it using just the inference text client and reducing the number of container services we need? For instance what docker container command line do you use to start the calculator plugin service?

EDIT: I cleaned up the issue there but the above is more about ways to set up a service like the calculator and possibly reduce the service dependencies to just a text client. In both cases we will need to start the plugin service as it won't be available from a GUI frontend as well.

pevogam avatar Apr 21 '23 06:04 pevogam

Hi,

Plugin is not included in docker containers, as its just example for now, there is no internal agreement yet, how we would go about default 'out-of-the-box' plugins.

You can for now just run this command from the plugin folder: uvicorn main:app --reload --port 8085 --host 0.0.0.0

And update IP of the plugin in inference/server/oaast_inference_server/routes/configs.py the accordingly to your network environment.

draganjovanovich avatar Apr 21 '23 08:04 draganjovanovich

Hi,

Plugin is not included in docker containers, as its just example for now, there is no internal agreement yet, how we would go about default 'out-of-the-box' plugins.

That's ok. The idea however is to spin it up so that we can experiment with a plugin in general and the changes here.

You can for now just run this command from the plugin folder: uvicorn main:app --reload --port 8085 --host 0.0.0.0

And update IP of the plugin in inference/server/oaast_inference_server/routes/configs.py the accordingly to your network environment.

Not sure if all of this is enough:

image

I did

pip install uvicorn starlette
/mnt/local/notebooks/Open-Assistant/inference/worker/plugins/calculator> uvicorn main:app --reload --port 8085 --host 0.0.0.0
INFO:     Will watch for changes in these directories: ['/mnt/local/notebooks/Open-Assistant/inference/worker/plugins/calculator']
INFO:     Uvicorn running on http://0.0.0.0:8085 (Press CTRL+C to quit)
INFO:     Started reloader process [29640] using statreload
INFO:     Started server process [29642]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     172.20.0.12:42990 - "GET /ai-plugin.json HTTP/1.1" 200 OK
INFO:     172.20.0.10:44318 - "GET /ai-plugin.json HTTP/1.1" 200 OK
INFO:     192.168.40.1:46356 - "GET / HTTP/1.1" 404 Not Found
INFO:     172.20.0.10:51110 - "GET /ai-plugin.json HTTP/1.1" 200 OK

and adapted the IP in

diff --git a/inference/server/oasst_inference_server/routes/configs.py b/inference/server/oasst_inference_server/routes/configs.py
index 46a8ff99..7c0f2a0e 100644
--- a/inference/server/oasst_inference_server/routes/configs.py
+++ b/inference/server/oasst_inference_server/routes/configs.py
@@ -12,7 +12,7 @@ from oasst_shared.schemas import inference
 # NOTE: Replace this with plugins that we will provide out of the box
 DUMMY_PLUGINS = [
     inference.PluginEntry(
-        url="http://192.168.0.35:8085/ai-plugin.json",
+        url="http://192.168.0.31:8085/ai-plugin.json",
         enabled=False,
         trusted=True,
     ),

then tested the port opening via telnet and it is there. The web UI shows the borked menu you can see above and the logs don't show the plugins list being populated with everything.

pevogam avatar Apr 21 '23 09:04 pevogam

All that you have done seems about right. I am not sure, why is not working for you, do you have errors on the docker logger?

draganjovanovich avatar Apr 21 '23 09:04 draganjovanovich

It would be cool if you or someone can make some automated "dev" chat sessions like: I or any dev, provide input messages as JSON files, drag over chat UI, and it automatically sends one message, waits for an answer, and then continues with a second message, etc For now, I do that manually.

draganjovanovich avatar Apr 21 '23 10:04 draganjovanovich

All that you have done seems about right. I am not sure, why is not working for you, do you have errors on the docker logger?

Unfortunately no error of any kind (also no Failed to fetch plugin config logging messages from the server), do the logs contain useful debugging information if the plugin was detected and loaded in some way? I can see the initial GET request for the service

INFO:     172.20.0.10:37900 - "GET /ai-plugin.json HTTP/1.1" 200 OK

and I can see the minimal plugin being listed in the work request:

plugins=[PluginEntry(url='Test', enabled=True, plugin_config=None, trusted=False)]

Perhaps the plugin_config should be populated and values here should be different?

pevogam avatar Apr 21 '23 10:04 pevogam

Yes there are some logs, like: Screenshot 2023-04-21 at 12 30 46

in routes/config.py

draganjovanovich avatar Apr 21 '23 10:04 draganjovanovich

Wondering, for a big PR like this, is there a way or does make sense to be able to deploy this as a feature branch to it's own testing env so that people reviewing can easily play with it on some specific dev env.

Not sure how feasible or not that would be in current infra and DevOps side of things but was just wondering if could be useful if was possible.

andrewm4894 avatar Apr 22 '23 09:04 andrewm4894

Not sure how feasible or not that would be in current infra and DevOps side of things but was just wondering if could be useful if was possible.

good idea, but gonna be pretty hard currently :D I think we'll go the route of just getting it onto dev/staging and then if things don't work out, fix or revert

yk avatar Apr 23 '23 17:04 yk

I think last commit deleted website/package-lock.json when it should delete package-lock.json in repo root

olliestanley avatar Apr 30 '23 19:04 olliestanley

Hmm, seems like the merge of main into PR branch has caused some mess in the PR history

olliestanley avatar May 01 '23 20:05 olliestanley

I reverted it, will try rebase, to fix conflicts

draganjovanovich avatar May 01 '23 20:05 draganjovanovich

:x: pre-commit failed. Please run pre-commit run --all-files locally and commit the changes. Find more information in the repository's CONTRIBUTING.md

github-actions[bot] avatar May 01 '23 22:05 github-actions[bot]