Open-Assistant
Open-Assistant copied to clipboard
Make backend seed data more realistic
Currently, if DEBUG_USE_SEED_DATA=True, the backend fills the database with seed data on startup. This is meant for debugging. Now, as we get more into debugging higher-level features, we need more realistic seed data, such as diverse tasks, long messages, existing instances of most data types, etc.
The first seed data was hard-coded in main.py. For longer texts please read the inital message trees from a JSON file.
I am interested to contribute into this project. Is the goal to make dummy more data flexible? How about just read from a json file, whose path is configurable in Settings?
@kenhktsui Yeah that sounds reasonable :)
Having it load all json files under a test data tree might be useful too, if it doesn't already. Then we can split them into files, group them in dirs by feature/purpose. It'd avoid enormous files and mean that we can easily identify data by functional area, delete things that aren't needed, update things as we figure out that we need more detail etc
@bitplane Yea, that's what I am thinking as well, so that it's easier to manage and can decouple so from the main.py I will propose this data structure. backend/test_data/
- generic/
- use_case1/
- use_case2/
I could take this PR.
@kenhktsui great! assigning this to you
Re-opening this. we have a better method to place seed data, but we don't have realistic seed data yet