Store inference worker behavior
Currently, the inference system lives under /inference. After installing the various dependencies, plus tmux, you can run it using the full-dev-setup.sh script (or use the docker compose inference profile).
The inference system consists of multiple parts:
- one central inference server that connects to a postgres and a redis database.
- many external workers that perform the actual inference
- each worker consist again of two parts, a python connector and a text-generation backend
- the text-generation backend exposes an HTTP API
- the python connector connects to the HTTP API and also to the central server via websocket
- optionally: the text-client.
The goal of this issue is to store the behavior of the inference workers in the database of the central server. We would like to track which worker did which task, the time of giving the task, the time of receiving the final answer, whether there was an error, whether the worker disconnected (and whether that is before or after they received the task), etc. so that we can properly assign credit to good workers and eventually notice bad or adversarial workers.
Wouldn't it be too big to store the whole input/output/etc? How big do you estimate one datapoint is?
it's totally fine, it's just a bit of text
This was completed in 5d6d371c2ff23575a95a1a41716d5f654283c147