Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Inference database dummy data fill ability

Open olliestanley opened this issue 2 years ago • 11 comments

There's some dummy data here which was used for filling the data collection backend DB, could maybe be reused

In that backend we had a setting which was used on server start to determine whether to fill with data, see here

olliestanley avatar May 08 '23 18:05 olliestanley

Hi there, I would like to work on this. thanks

revenge47 avatar May 09 '23 14:05 revenge47

Hi there, I would like to work on this. thanks

Hi. You can see an example here of a function used in the data collection backend backend/ to fill the DB with seed data for testing. We need a function adding to the inference backend inference/server/ which will do similar, but with chat data instead of task data.

olliestanley avatar May 09 '23 15:05 olliestanley

Hi @olliestanley

I am interested in helping out with this issue. I think adding a flag and a path to a file to the settings and triggering an action on startup to load data into an empty database makes sense to me.

I am wondering how we easily can get data. I see the inference server has an export function to dump data into a file.

Do you think it would be a good solution to just load this exported data to prepopulate the database? If so, would someone be able to provide an example file to work with?

wirthual avatar May 24 '23 17:05 wirthual

Hi @olliestanley

I am interested in helping out with this issue. I think adding a flag and a path to a file to the settings and triggering an action on startup to load data into an empty database makes sense to me.

I am wondering how we easily can get data. I see the inference server has an export function to dump data into a file.

Do you think it would be a good solution to just load this exported data to prepopulate the database? If so, would someone be able to provide an example file to work with?

We can consider something like this later on but at least for now the data does not need to be too realistic, the functionality is what's important, you could do some format conversion on the data linked in the OP of this issue as temporary filler data

olliestanley avatar May 25 '23 09:05 olliestanley

On Thu, May 25, 2023 at 2:05 AM Oliver Stanley @.***> wrote:

Hi @olliestanley https://github.com/olliestanley

I am interested in helping out with this issue. I think adding a flag and a path to a file to the settings and triggering an action on startup to load data into an empty database makes sense to me.

I am wondering how we easily can get data. I see the inference server has an export function to dump data into a file.

Do you think it would be a good solution to just load this exported data to prepopulate the database? If so, would someone be able to provide an example file to work with?

We can consider something like this later on but at least for now the data does not need to be too realistic, the functionality is what's important, you could do some format conversion on the data linked in the OP of this issue as temporary filler data

— Reply to this email directly, view it on GitHub https://github.com/LAION-AI/Open-Assistant/issues/3090#issuecomment-1562547828, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DWCFW3X7BWTJ3PQBYPQUDXH4OENANCNFSM6AAAAAAX2JBIEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

-- Earle F.

Earlef avatar May 25 '23 09:05 Earlef

hey @wirthual Mind collaborating on this? I had a few issues with some implementation parts. Thanks

revenge47 avatar May 26 '23 16:05 revenge47

Hi,

Yes happy to collaborate. Did you already do changes to the code?

I think a first good step would be to understand the schema of the inference database so we know what data we need to create and how it's connected in the database.

I think the most important onces are DBMessage and DBChat if I understand it correctly.

wirthual avatar May 27 '23 15:05 wirthual

Hey sorry for the late reply, I haven't done any commits but wrote some code to how I think should be implemented (not complete however). I agree DBMessage and DBChat are the most important. Can I suggest a scheduled time to do an online meet for this just to show how i have implemented(partially) it?

revenge47 avatar May 30 '23 16:05 revenge47

Hi, Did you push your current changes to your open-assistant branch? If so I can have a look.

I also worked on implementing this functionality based on Polyfactory. I opened a MR so you can easily see what changes I did.

I was able to fill the database with fake data (only outside of docker for now though)

Here are the steps I did to try it:

  1. Bring up postgres and redis so I can run the inference server:
docker compose --profile inference up -d
docker stop open-assistant-inference-server-1
  1. In the config point postgres and redis to the ports of the started containers:
redis_port: int = 6389
postgres_port: str = "5434"
insert_fake_data: bool = True
  1. Then I started a local instance of the server:
uvicorn main:app --reload

Then you should see ouput like:

2023-05-30 15:19:32.821 | WARNING  | main:insert_fake_data_event:157 - Done inserting fake data into database

I used a tool called sequeler to check if the data in the postgres instance was added as expected.

Let me know if that makes sense.

wirthual avatar May 30 '23 22:05 wirthual

I just updated the code so it runs inside the Docker container. If you want to run it outside, you also need to adapt the path in the settings to point to the fake data.

@olliestanley what do you think of this approach?

wirthual avatar Jun 01 '23 19:06 wirthual

@wirthual sorry for this, been having technical issue. Please allow me to review it and let you know

revenge47 avatar Jun 27 '23 07:06 revenge47