Open-Assistant Make backend seed data more realistic

Currently, if DEBUG_USE_SEED_DATA=True, the backend fills the database with seed data on startup. This is meant for debugging. Now, as we get more into debugging higher-level features, we need more realistic seed data, such as diverse tasks, long messages, existing instances of most data types, etc.

Jan 03 '23 10:01 yk

The first seed data was hard-coded in main.py. For longer texts please read the inital message trees from a JSON file.

Jan 03 '23 13:01 andreaskoepf

I am interested to contribute into this project. Is the goal to make dummy more data flexible? How about just read from a json file, whose path is configurable in Settings?

Jan 03 '23 15:01 kenhktsui

@kenhktsui Yeah that sounds reasonable :)

Having it load all json files under a test data tree might be useful too, if it doesn't already. Then we can split them into files, group them in dirs by feature/purpose. It'd avoid enormous files and mean that we can easily identify data by functional area, delete things that aren't needed, update things as we figure out that we need more detail etc

Jan 03 '23 16:01 bitplane

@bitplane Yea, that's what I am thinking as well, so that it's easier to manage and can decouple so from the main.py I will propose this data structure. backend/test_data/

generic/
use_case1/
use_case2/

I could take this PR.

Jan 03 '23 16:01 kenhktsui

@kenhktsui great! assigning this to you

Jan 03 '23 17:01 AbdBarho

Re-opening this. we have a better method to place seed data, but we don't have realistic seed data yet

Jan 10 '23 20:01 yk