Open-Assistant
Open-Assistant copied to clipboard
Proposal: import message tree state directly from input file
Code around here shows that there's only two message tree state (BACKLOG_RANKING & RANKING) are possible to be imported into database.
https://github.com/LAION-AI/Open-Assistant/blob/dc105dff36ac70ecf0959ac9f06df66fe88258e9/backend/import.py#L125
Could the import script add a "bypass_state" flag to import the tree_state attribute directly from input file?
Since I have an external data source with some tiny message trees, most of message does not have siblings, so they need to be expanded (grown). I could determine tree states by myself, so import script could just pass my tree states to the database.
We need the import script to load synthetic messages into the DB .. therefore the existing functionality needs to stay in some form.
What you suggest would be possible, you could probably also import the messages without inserting entries into the message_tree_state table and and rely on TreeManager.ensure_tree_states() (which is called during backend startup) to find the correct states for the inserted trees.
If you want to overhaul the import script, please go ahead!
We need the import script to load synthetic messages into the DB .. therefore the existing functionality needs to stay in some form.
What you suggest would be possible, you could probably also import the messages without inserting entries into the
message_tree_statetable and and rely onTreeManager.ensure_tree_states()(which is called during backend startup) to find the correct states for the inserted trees.If you want to overhaul the import script, please go ahead!
Thank you for the hint! I'm working on it.
https://github.com/LAION-AI/Open-Assistant/pull/1947 worked on my own data, I've tested on 2 jsonls (only message nodes and only tree nodes), both working.