OpenAdapt icon indicating copy to clipboard operation
OpenAdapt copied to clipboard

Implement client/server functionality

Open abrichr opened this issue 1 year ago • 5 comments

Feature request

To support https://github.com/openadaptai/SoM we need to implement a client.py with https://www.gradio.app/docs/client. See:

  • client.py: https://github.com/microsoft/SoM/pull/19/files#diff-1ebfaf6cb3592166b73835fa82333cb7109e7c624865c0039a7b22ff34aa27fa)
  • deploy.py: https://github.com/microsoft/SoM/pull/19/files#diff-5c9ed18af9a5f902219d12c3044ccb193c2c304a3748d02702889c2ca5703978

Motivation

https://github.com/openadaptai/SoM is state-of-the-art for visual understanding, and only runs on Linux / CUDA

Refer to system diagram:

image

Inference (SoM/SAM) must be done remotely.

We wish to implement:

  • openadapt/adapters/som/client.py: modified version of client.py in https://github.com/microsoft/SoM/pull/19 to support getting marked screenshots during analysis (visualization) and replay
  • openadapt/adapters/som/server, which can be a git submodule containing https://github.com/OpenAdaptAI/SoM/

abrichr avatar Dec 12 '23 20:12 abrichr

@FFFiend thoughts? 🙏 😄

abrichr avatar Feb 29 '24 03:02 abrichr

I took a look at the client.py code as well as the docs so from my understanding:

Instead of using a gradio app url or HuggingFace space, we would like to create an entry point to the EC2 SoM instance we have available and return a marked screenshot as the output of predict

Bit confused on how a marked screenshot is defined though 😄

FFFiend avatar Feb 29 '24 04:02 FFFiend

@FFFiend thanks for your quick response!

Instead of using a gradio app url or HuggingFace space, we would like to create an entry point to the EC2 SoM instance we have available and return a marked screenshot as the output of predict

Exactly right! We would need to integrate the deploy.py and a variation of client.py, which would both be called from elsewhere in openadapt (e.g. visualize.py, replay.py).

Bit confused on how a marked screenshot is defined though 😄

No worries! You can see the marked screenshot in the PR description, reproduced here:

image

The original screenshot is on the left, the marked screenshot is on the right.

abrichr avatar Mar 02 '24 23:03 abrichr

awesome, so for client.py I'm envisioning client to work as follows:

  1. Use start and stop from the Deploy class for instantiating and then closing the instance.
  2. Use either paramiko https://www.paramiko.org/ or https://pexpect.readthedocs.io/en/stable/ to have a runner for functions within the instance.
  3. Write up or reuse SOM logic from one of the existing demos (demo_som.py for example) into a function and then plug that into the runner above, and inference is done.

The original repo doesn't have any architecture or heavy ML code laid out so I'm guessing the meat n potatoes is within the demo files, but I could be wrong.

FFFiend avatar Mar 06 '24 08:03 FFFiend

@FFFiend thanks for your patience! Just saw this 😅

  1. Use start and stop from the Deploy class for instantiating and then closing the instance.

Agreed.

  1. Use either paramiko https://www.paramiko.org/ or https://pexpect.readthedocs.io/en/stable/ to have a runner for functions within the instance.

This may be unnecessary. https://github.com/microsoft/SoM includes a client.py which uses the Gradio API -- the client.py should look similar.

  1. Write up or reuse SOM logic from one of the existing demos (demo_som.py for example) into a function and then plug that into the runner above, and inference is done.

Bingo! This should all go in client.py.

abrichr avatar Mar 18 '24 22:03 abrichr