treelib icon indicating copy to clipboard operation
treelib copied to clipboard

Add a more general interface to manipulate importing/exporting

Open caesar0301 opened this issue 7 years ago • 10 comments

Exporting and reloading a tree data is a general scenario. This feature aims to provide a unified interface to process different data format, including (without limitation on more options):

  • Json: as in #75, #78 and #73
  • Graphviz dot format: as in plugins/export_to_dot
  • Yaml: to be added

caesar0301 avatar Dec 01 '17 04:12 caesar0301

Did anything come of this? Is there a way to population a tree in treelib with a JSON? Thanks.

mbadros avatar Feb 03 '19 17:02 mbadros

Would a general interface not just mean a couple of:

  • to_dict(), to_json(), to_graphviz(), ... instance methods.
  • along with from_dict(), from_json(), from_graphviz(), ... classmethods.

Most of these methods exist already, so you'd just have to name them properly. Furthermore they should have the same signature.

Or do you think about putting these functions into new modules? For example:

  • treelib.save.to_dict(tree, ...), treelib.save.to_json(tree, ...), treelib.save.to_graphviz(tree, ...)
  • treelib.load.from_dict(dict_), treelib.load.from_json(json_file), treelib.load.from_graphviz(dot_file)

I could work on that, if you need help.

villmow avatar Mar 11 '19 08:03 villmow

I'll drop another exporting function here, just in case somebody wants to do the same. I wanted to convert a tree into binary tree, using the left-child-right-sibling method. As treelib can't distinguish between left and right childs I used the binarytree package.

import binarytree as bt

def to_left_child_right_sibling(tree: tl.Tree) -> Tuple[bt.Node, Dict[int, str]]: 
    """ Converts a treelib.Tree object to a binarytree. 
    
    The binarytree package is used for storing the new LCRS-binary tree, as 
    Treelib trees can't distinguish between left and right children. The 
    binarytree.Node class expects numeric node values (identifiers), the
    tags/labels/names of the nodes are returned in a dictionary. 
    """
    
    def to_lcrs(tree: tl.Tree, root_id: int = None) -> bt.Node:
        """Recursivly constructs a lcrs tree starting from node at root_id"""

        if root_id is None:
            root_id = tree.root

        # construct a root node
        root = bt.Node(root_id)

        # if it does not have any children, we return it (recursion end)
        if not tree.children(root_id):
            return root

        # otherwise we recursivly construct lcrs trees of every child ...
        sub_trees = [to_lcrs(tree, child_id) for child_id in tree[root_id].fpointer]

        # ... and link them together as right childs
        for i in range(1, len(sub_trees)):
            sub_trees[i - 1].right = sub_trees[i]

        # the first lcrs tree is now the left child of our root
        root.left = sub_trees[0] if len(sub_trees) > 0 else None

        return root
    
    id2name = {i: node.tag for i, node in tree.nodes.items()}
    root = to_lcrs(tree)

    return root, id2name

villmow avatar Mar 11 '19 08:03 villmow

are there function to load json data to tree yet ?

burawit avatar Nov 10 '19 15:11 burawit

There are 3 types of information that should be stored to serialize/deserialize a tree instance: tree information, node information, nodes hierarchy.

More specifically:

  • tree identifier
  • node "hierarchy" (nodes bpointer/fpointers)
  • node base attributes: tag, identifier
  • node data (requires contraints since some objects aren't serializable: eg python set for json serialization) Then in case of inheritance:
  • tree node_class in case of node class inheriting from treelib.Node
  • tree other attributes in case of tree class inheriting from treelib.Tree
  • node other attributes in case of node class inheriting from treelib.Node

Without going into the details of a specific output format, an approach allowing inheritance could be to have distinct methods that can be overriden:

  • treelib.Tree _serialize_metadata method, serializing tree information (identifier, tree other attributes in case of inheritance)
  • treelib.Tree _serialize_hierarchy method, serializing hierarchy (extracted from bpointer/fpointers)
  • treelib.Node _serialize_node method, serializing node information (tag, identifier, data etc)

Note: for those not requiring a specific serialization format, consider using python pickle module: https://docs.python.org/3/library/pickle.html

leonardbinet avatar Dec 08 '19 14:12 leonardbinet

I think it would be appropriate to implement right away https://github.com/caesar0301/treelib/issues/95 (ability to export to stream) into the solution of this issue. @villmow are you still interested into working on that subject or do you need help?

leonardbinet avatar Dec 15 '19 11:12 leonardbinet

I didn't know the graphviz dot format, but from what I understand I think we shouldn't try to handle this in the same way that yaml json formats, since it is much less generic.

For json/yaml and such, we could have some kind of common _export method, whose goal would be to provide a serializable python object, and then apply either a JSON or YAML serializer.

leonardbinet avatar Dec 15 '19 14:12 leonardbinet

@caesar0301 before I go further and implement the json/yaml serialization with stream output, do you have an opinion on this design: https://github.com/caesar0301/treelib/pull/133

alk-lbinet avatar Dec 16 '19 09:12 alk-lbinet

Hello, has from_json or anything similar been implemented yet? since this request is still open I assume no?

thanojo avatar Jun 29 '21 09:06 thanojo

https://anytree.readthedocs.io/en/latest/index.html

eckelsjd avatar Mar 17 '22 01:03 eckelsjd