ML upgrades
MVP for adding unique, trainable models project-specific on SMART.
Docker
Created a new ml container to do all ML-related processes in.
This container has several volumes, /csv and /models which store the .csv files used to train models and the actual models respectively.
ML
Removed all embeddings/sentencetransformers code from the backend and moved it to the new ml container. These processes are now available on a FastAPI REST API that will interact with the backend.
To train models, abstracted our csv-to-embeddings-model process to work with an uploaded csv sent to ml's FastAPI.
When SMART now goes to encode an embedding/compute similarity comparisons, backend sends a request to ml with the desired string(s) to encode. Within ml, it first looks to see if a model is already trained under /models. If one exists, it uses that specifically trained model to compute the embeddings; if not is uses our default SMART model.
Backend Updates
Updated all existing backend ML-related processes to integrate with the new ml container.
Created new MlModel in Django models.
Added a new template accessible through a project's Update page, "Update ML". On this page an admin can upload a .csv file with text pairs and train a new model to be used on that project. This process involves:
- Update/create a
MlModeland set its status to "Training" - Call
ml's FastAPI/train/endpoint to kick off the model training process, forwarding the uploaded.csvfile. - Train the actual model in the
mlcontainer. - Update the
MlModeland set its status to "Trained". - Update existing
LabelEmbeddingsto be re-computed with the newly trained model
Next Steps
Couldn't get persistent storage for both the /csv or /models/ volumes in ml... probably just a Docker thing I'm messing up so if somebody can get a second set of eyes on that and see what's not configured properly that would be very helpful!
Once SMART is moved to an entirely React-based frontend, we can make a reusable hook with React Query for training the model. Unfortunately, React only works on the Annotate tab right now and so the form/logic for sending the request to ml had to be done through Django at this time.
I have updated Docker for /envs/dev and we'll need to configure it for production.
@dsteedRTI adding this as a comment because individual line-level comments could get confusing here. Regarding your docker persistence issue, you'll need to name the volumes. What's a little wacky is that in the current pattern with extending a common docker-compose file, you're naming those volumes in the dev and prod docker-compose files, but referencing them in the common file (so you'll need to declare the volumes in both the dev and prod files).
For consistency's sake, you can do the following:
- In the
volumessection of/envs/dev/docker-compose.yml, add your named volumes like so:
volumes:
smart_pgdata:
external: true
name: vol_smart_pgdata
smart_data:
external: true
name: vol_smart_data
ml_csv:
ml_models:
- In your
envs/docker-common.ymlfile, use those named volumes in your service definition for themlservice:
ml:
container_name: ml
volumes:
- ../ml:/code
- ml_csv:/code/csv
- ml_models:/code/models
ports:
- "${EXTERNAL_ML_PORT:-8001}:8001"
Those containers should then be persistent.
One other note I have is that in docker-common.yml, the application is configured to bind-mount from host a number of things:
backend:
container_name: backend
volumes:
- ../backend/django:/code ## bind mount
- ../backend/docker:/code/docker ## bind mount
- ../frontend:/code/frontend ## bind mount
- smart_data:/data/
ports:
- "${EXTERNAL_BACKEND_PORT:-8000}:8000"
command: ./runserver.sh
smart_frontend:
container_name: smart_frontend
volumes:
- ../frontend/:/code ## bind mount
- /code/node_modules
This is typically something that you would do in dev to enable hot reloading, but in prod you would COPY those files in your Dockerfile. This isn't a huge issue, but it does mean that your production application is bind mounting some directories on the host which can cause future problems with directory permissions and/or performance. I'd recommend at some point moving the bind mounts to the dev dockerfile and ensuring that everything is copied when building for prod.
@hardingalexh I can confirm implementing those Docker configurations fixed persistent storage on dev, thank you!
(Commenting this to document)
Deleting projects with custom ML models errors out: