treelib
treelib copied to clipboard
Add a more general interface to manipulate importing/exporting
Exporting and reloading a tree data is a general scenario. This feature aims to provide a unified interface to process different data format, including (without limitation on more options):
- Json: as in #75, #78 and #73
- Graphviz dot format: as in
plugins/export_to_dot
- Yaml: to be added
Did anything come of this? Is there a way to population a tree in treelib with a JSON? Thanks.
Would a general interface not just mean a couple of:
-
to_dict()
,to_json()
,to_graphviz()
, ... instance methods. - along with
from_dict()
,from_json()
,from_graphviz()
, ... classmethods.
Most of these methods exist already, so you'd just have to name them properly. Furthermore they should have the same signature.
Or do you think about putting these functions into new modules? For example:
-
treelib.save.to_dict(tree, ...)
,treelib.save.to_json(tree, ...)
,treelib.save.to_graphviz(tree, ...)
-
treelib.load.from_dict(dict_)
,treelib.load.from_json(json_file)
,treelib.load.from_graphviz(dot_file)
I could work on that, if you need help.
I'll drop another exporting function here, just in case somebody wants to do the same. I wanted to convert a tree into binary tree, using the left-child-right-sibling method. As treelib can't distinguish between left and right childs I used the binarytree package.
import binarytree as bt
def to_left_child_right_sibling(tree: tl.Tree) -> Tuple[bt.Node, Dict[int, str]]:
""" Converts a treelib.Tree object to a binarytree.
The binarytree package is used for storing the new LCRS-binary tree, as
Treelib trees can't distinguish between left and right children. The
binarytree.Node class expects numeric node values (identifiers), the
tags/labels/names of the nodes are returned in a dictionary.
"""
def to_lcrs(tree: tl.Tree, root_id: int = None) -> bt.Node:
"""Recursivly constructs a lcrs tree starting from node at root_id"""
if root_id is None:
root_id = tree.root
# construct a root node
root = bt.Node(root_id)
# if it does not have any children, we return it (recursion end)
if not tree.children(root_id):
return root
# otherwise we recursivly construct lcrs trees of every child ...
sub_trees = [to_lcrs(tree, child_id) for child_id in tree[root_id].fpointer]
# ... and link them together as right childs
for i in range(1, len(sub_trees)):
sub_trees[i - 1].right = sub_trees[i]
# the first lcrs tree is now the left child of our root
root.left = sub_trees[0] if len(sub_trees) > 0 else None
return root
id2name = {i: node.tag for i, node in tree.nodes.items()}
root = to_lcrs(tree)
return root, id2name
are there function to load json data to tree yet ?
There are 3 types of information that should be stored to serialize/deserialize a tree
instance: tree
information, node
information, nodes
hierarchy.
More specifically:
-
tree
identifier
-
node
"hierarchy" (nodesbpointer
/fpointers
) -
node
base attributes:tag
,identifier
-
node
data
(requires contraints since some objects aren't serializable: eg pythonset
for json serialization) Then in case of inheritance: -
tree
node_class
in case of node class inheriting fromtreelib.Node
-
tree
other attributes in case of tree class inheriting fromtreelib.Tree
-
node
other attributes in case of node class inheriting fromtreelib.Node
Without going into the details of a specific output format, an approach allowing inheritance could be to have distinct methods that can be overriden:
-
treelib.Tree
_serialize_metadata
method, serializingtree
information (identifier, tree other attributes in case of inheritance) -
treelib.Tree
_serialize_hierarchy
method, serializing hierarchy (extracted from bpointer/fpointers) -
treelib.Node
_serialize_node
method, serializing node information (tag
,identifier
,data
etc)
Note: for those not requiring a specific serialization format, consider using python pickle
module: https://docs.python.org/3/library/pickle.html
I think it would be appropriate to implement right away https://github.com/caesar0301/treelib/issues/95 (ability to export to stream) into the solution of this issue. @villmow are you still interested into working on that subject or do you need help?
I didn't know the graphviz
dot
format, but from what I understand I think we shouldn't try to handle this in the same way that yaml
json
formats, since it is much less generic.
For json
/yaml
and such, we could have some kind of common _export
method, whose goal would be to provide a serializable python object, and then apply either a JSON or YAML serializer.
@caesar0301 before I go further and implement the json/yaml serialization with stream output, do you have an opinion on this design: https://github.com/caesar0301/treelib/pull/133
Hello, has from_json
or anything similar been implemented yet? since this request is still open I assume no?
https://anytree.readthedocs.io/en/latest/index.html