Open-Assistant
Open-Assistant copied to clipboard
Inference database dummy data fill ability
There's some dummy data here which was used for filling the data collection backend DB, could maybe be reused
In that backend we had a setting which was used on server start to determine whether to fill with data, see here
Hi there, I would like to work on this. thanks
Hi there, I would like to work on this. thanks
Hi. You can see an example here of a function used in the data collection backend backend/ to fill the DB with seed data for testing. We need a function adding to the inference backend inference/server/ which will do similar, but with chat data instead of task data.
Hi @olliestanley
I am interested in helping out with this issue. I think adding a flag and a path to a file to the settings and triggering an action on startup to load data into an empty database makes sense to me.
I am wondering how we easily can get data. I see the inference server has an export function to dump data into a file.
Do you think it would be a good solution to just load this exported data to prepopulate the database? If so, would someone be able to provide an example file to work with?
Hi @olliestanley
I am interested in helping out with this issue. I think adding a flag and a path to a file to the settings and triggering an action on startup to load data into an empty database makes sense to me.
I am wondering how we easily can get data. I see the inference server has an export function to dump data into a file.
Do you think it would be a good solution to just load this exported data to prepopulate the database? If so, would someone be able to provide an example file to work with?
We can consider something like this later on but at least for now the data does not need to be too realistic, the functionality is what's important, you could do some format conversion on the data linked in the OP of this issue as temporary filler data
On Thu, May 25, 2023 at 2:05 AM Oliver Stanley @.***> wrote:
Hi @olliestanley https://github.com/olliestanley
I am interested in helping out with this issue. I think adding a flag and a path to a file to the settings and triggering an action on startup to load data into an empty database makes sense to me.
I am wondering how we easily can get data. I see the inference server has an export function to dump data into a file.
Do you think it would be a good solution to just load this exported data to prepopulate the database? If so, would someone be able to provide an example file to work with?
We can consider something like this later on but at least for now the data does not need to be too realistic, the functionality is what's important, you could do some format conversion on the data linked in the OP of this issue as temporary filler data
— Reply to this email directly, view it on GitHub https://github.com/LAION-AI/Open-Assistant/issues/3090#issuecomment-1562547828, or unsubscribe https://github.com/notifications/unsubscribe-auth/A2DWCFW3X7BWTJ3PQBYPQUDXH4OENANCNFSM6AAAAAAX2JBIEA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
-- Earle F.
hey @wirthual Mind collaborating on this? I had a few issues with some implementation parts. Thanks
Hi,
Yes happy to collaborate. Did you already do changes to the code?
I think a first good step would be to understand the schema of the inference database so we know what data we need to create and how it's connected in the database.
I think the most important onces are DBMessage and DBChat if I understand it correctly.
Hey sorry for the late reply, I haven't done any commits but wrote some code to how I think should be implemented (not complete however). I agree DBMessage and DBChat are the most important. Can I suggest a scheduled time to do an online meet for this just to show how i have implemented(partially) it?
Hi, Did you push your current changes to your open-assistant branch? If so I can have a look.
I also worked on implementing this functionality based on Polyfactory. I opened a MR so you can easily see what changes I did.
I was able to fill the database with fake data (only outside of docker for now though)
Here are the steps I did to try it:
- Bring up postgres and redis so I can run the inference server:
docker compose --profile inference up -d
docker stop open-assistant-inference-server-1
- In the config point postgres and redis to the ports of the started containers:
redis_port: int = 6389
postgres_port: str = "5434"
insert_fake_data: bool = True
- Then I started a local instance of the server:
uvicorn main:app --reload
Then you should see ouput like:
2023-05-30 15:19:32.821 | WARNING | main:insert_fake_data_event:157 - Done inserting fake data into database
I used a tool called sequeler to check if the data in the postgres instance was added as expected.
Let me know if that makes sense.
I just updated the code so it runs inside the Docker container. If you want to run it outside, you also need to adapt the path in the settings to point to the fake data.
@olliestanley what do you think of this approach?
@wirthual sorry for this, been having technical issue. Please allow me to review it and let you know