Open-Assistant icon indicating copy to clipboard operation
Open-Assistant copied to clipboard

Make backend seed data more realistic

Open yk opened this issue 2 years ago • 5 comments

Currently, if DEBUG_USE_SEED_DATA=True, the backend fills the database with seed data on startup. This is meant for debugging. Now, as we get more into debugging higher-level features, we need more realistic seed data, such as diverse tasks, long messages, existing instances of most data types, etc.

yk avatar Jan 03 '23 10:01 yk

The first seed data was hard-coded in main.py. For longer texts please read the inital message trees from a JSON file.

andreaskoepf avatar Jan 03 '23 13:01 andreaskoepf

I am interested to contribute into this project. Is the goal to make dummy more data flexible? How about just read from a json file, whose path is configurable in Settings?

kenhktsui avatar Jan 03 '23 15:01 kenhktsui

@kenhktsui Yeah that sounds reasonable :)

Having it load all json files under a test data tree might be useful too, if it doesn't already. Then we can split them into files, group them in dirs by feature/purpose. It'd avoid enormous files and mean that we can easily identify data by functional area, delete things that aren't needed, update things as we figure out that we need more detail etc

bitplane avatar Jan 03 '23 16:01 bitplane

@bitplane Yea, that's what I am thinking as well, so that it's easier to manage and can decouple so from the main.py I will propose this data structure. backend/test_data/

  • generic/
  • use_case1/
  • use_case2/

I could take this PR.

kenhktsui avatar Jan 03 '23 16:01 kenhktsui

@kenhktsui great! assigning this to you

AbdBarho avatar Jan 03 '23 17:01 AbdBarho

Re-opening this. we have a better method to place seed data, but we don't have realistic seed data yet

yk avatar Jan 10 '23 20:01 yk