SMART ML upgrades

MVP for adding unique, trainable models project-specific on SMART.

Docker

Created a new ml container to do all ML-related processes in.

This container has several volumes, /csv and /models which store the .csv files used to train models and the actual models respectively.

ML

Removed all embeddings/sentencetransformers code from the backend and moved it to the new ml container. These processes are now available on a FastAPI REST API that will interact with the backend.

To train models, abstracted our csv-to-embeddings-model process to work with an uploaded csv sent to ml's FastAPI.

When SMART now goes to encode an embedding/compute similarity comparisons, backend sends a request to ml with the desired string(s) to encode. Within ml, it first looks to see if a model is already trained under /models. If one exists, it uses that specifically trained model to compute the embeddings; if not is uses our default SMART model.

Backend Updates

Updated all existing backend ML-related processes to integrate with the new ml container.

Created new MlModel in Django models.

Added a new template accessible through a project's Update page, "Update ML". On this page an admin can upload a .csv file with text pairs and train a new model to be used on that project. This process involves:

Update/create a MlModel and set its status to "Training"
Call ml's FastAPI /train/ endpoint to kick off the model training process, forwarding the uploaded .csv file.
Train the actual model in the ml container.
Update the MlModel and set its status to "Trained".
Update existing LabelEmbeddings to be re-computed with the newly trained model

Next Steps

Couldn't get persistent storage for both the /csv or /models/ volumes in ml... probably just a Docker thing I'm messing up so if somebody can get a second set of eyes on that and see what's not configured properly that would be very helpful!

Once SMART is moved to an entirely React-based frontend, we can make a reusable hook with React Query for training the model. Unfortunately, React only works on the Annotate tab right now and so the form/logic for sending the request to ml had to be done through Django at this time.

I have updated Docker for /envs/dev and we'll need to configure it for production.

Jul 05 '23 20:07 dsteedRTI

@dsteedRTI adding this as a comment because individual line-level comments could get confusing here. Regarding your docker persistence issue, you'll need to name the volumes. What's a little wacky is that in the current pattern with extending a common docker-compose file, you're naming those volumes in the dev and prod docker-compose files, but referencing them in the common file (so you'll need to declare the volumes in both the dev and prod files).

For consistency's sake, you can do the following:

In the volumes section of /envs/dev/docker-compose.yml, add your named volumes like so:

volumes:
  smart_pgdata:
    external: true
    name: vol_smart_pgdata
  smart_data:
    external: true
    name: vol_smart_data
  ml_csv:
  ml_models:

In your envs/docker-common.yml file, use those named volumes in your service definition for the ml service:

ml:
    container_name: ml
    volumes:
      - ../ml:/code
      - ml_csv:/code/csv
      - ml_models:/code/models
    ports:
      - "${EXTERNAL_ML_PORT:-8001}:8001"

Those containers should then be persistent.

One other note I have is that in docker-common.yml, the application is configured to bind-mount from host a number of things:

backend:
    container_name: backend
    volumes:
      - ../backend/django:/code ## bind mount
      - ../backend/docker:/code/docker ## bind mount
      - ../frontend:/code/frontend ## bind mount
      - smart_data:/data/
    ports:
      - "${EXTERNAL_BACKEND_PORT:-8000}:8000"
    command: ./runserver.sh

  smart_frontend:
    container_name: smart_frontend
    volumes:
      - ../frontend/:/code ## bind mount
      - /code/node_modules

This is typically something that you would do in dev to enable hot reloading, but in prod you would COPY those files in your Dockerfile. This isn't a huge issue, but it does mean that your production application is bind mounting some directories on the host which can cause future problems with directory permissions and/or performance. I'd recommend at some point moving the bind mounts to the dev dockerfile and ensuring that everything is copied when building for prod.

Jul 06 '23 13:07 hardingalexh

@hardingalexh I can confirm implementing those Docker configurations fixed persistent storage on dev, thank you!

Jul 06 '23 17:07 dsteedRTI

(Commenting this to document)

Deleting projects with custom ML models errors out: Screen Shot 2023-07-19 at 2 43 40 PM

Jul 19 '23 18:07 dsteedRTI