anytree
anytree copied to clipboard
Feature Request - Node Tree from list of delimited strings
I've got some files I'm trying to parse into a tree (a FEM assembly tree actually) and need to take a list of lists and create a tree form it. Because of the way I get the raw input that I'm parsing (TCL script) all the lists have the full path, much of which is repeated in each list. I'm sure I could get this working eventually without a 'batch node creator' but I wanted to throw this out there anyways for consideration that such a batch node creator be added in the future.
Below is a (hopefully understandable) minimum example copied and pasted from a markdown export of a Jupyter notebook. I think this request might be similar to others made previously and if so feel free to close this one and/or link it to other issues.
Sorry for the book-length post!!
from anytree import Node, RenderTree
from pathlib import Path
Make the Tree Manually w/ anytree
a1 = Node('Assembly 1', parent=None)
a1_sa1 = Node('Sub Assembly 1', parent=a1)
a1_sa1_ssa1 = Node('Sub Sub Assembly 1', parent=a1_sa1)
a1_sa1_ssa1_sssa1 = Node('Sub Sub Sub Assembly 1', parent=a1_sa1_ssa1)
a1_sa1_ssa1_sssa1.children = [Node('Component 1'), Node('Component 2')]
a1_sa2 = Node('Sub Assembly 2', parent=a1)
a1_sa2.children = [Node('Component 1'), Node('Component 2'), Node('Component 3'),
Node('Component 4'), Node('Component 5'), Node('Component 6'),
Node('Component 7'), Node('Component 8'), Node('Component 9'),
Node('Component 10'), Node('Component 11'), Node('Component 12'),
Node('Component 13'), Node('Component 14'), Node('Component 15'),
Node('Component 16')]
a1_sa3 = Node('Sub Assembly 3', parent=a1)
a1_sa3_ssa1 = Node('Sub Sub Assembly 1', parent=a1_sa3)
a1_sa3_ssa1_c1 = Node('Component 1', parent=a1_sa3_ssa1)
a1_sa3_ssa2 = Node('Sub Sub Assembly 2', parent=a1_sa3)
a1_sa3_ssa2_c1 = Node('Component 1', parent=a1_sa3_ssa2)
a1_sa3_ssa3 = Node('Sub Sub Assembly 3', parent=a1_sa3)
a1_sa3_ssa3_c1 = Node('Component 1', parent=a1_sa3_ssa3)
a1_sa3_c1 = Node('Component 1', parent=a1_sa3)
a1_sa3_c2 = Node('Component 2', parent=a1_sa3)
a1_sa3_c3 = Node('Component 3', parent=a1_sa3)
a1_sa3_c4 = Node('Component 4', parent=a1_sa3)
a1_sa3_ssa4 = Node('Sub Sub Assembly 4', parent=a1_sa3)
a1_sa3_ssa4_c1 = Node('Component 1', parent=a1_sa3_ssa4)
a1_sa3_ssa5 = Node('Sub Sub Assembly 5', parent=a1_sa3)
a1_sa3_ssa5_c1 = Node('Component 1', parent=a1_sa3_ssa5)
a1_sa3_ssa6 = Node('Sub Sub Assembly 6', parent=a1_sa3)
a1_sa3_ssa6_c1 = Node('Component 1', parent=a1_sa3_ssa6)
a1_sa3_ssa7 = Node('Sub Sub Assembly 7', parent=a1_sa3)
a1_sa3_ssa7_c1 = Node('Component 1', parent=a1_sa3_ssa7)
a1_sa3_ssa8 = Node('Sub Sub Assembly 8', parent=a1_sa3)
a1_sa3_ssa8_c1 = Node('Component 1', parent=a1_sa3_ssa8)
a1_sa3_ssa9 = Node('Sub Sub Assembly 9', parent=a1_sa3)
a1_sa3_ssa9_c1 = Node('Component 1', parent=a1_sa3_ssa9)
a1_sa3_ssa10 = Node('Sub Sub Assembly 10', parent=a1_sa3)
a1_sa3_ssa10_c1 = Node('Component 1', parent=a1_sa3_ssa10)
for pre, fill, node in RenderTree(a1):
print(f'{pre}{node.name}')
Assembly 1
├── Sub Assembly 1
│ └── Sub Sub Assembly 1
│ └── Sub Sub Sub Assembly 1
│ ├── Component 1
│ └── Component 2
├── Sub Assembly 2
│ ├── Component 1
│ ├── Component 2
│ ├── Component 3
│ ├── Component 4
│ ├── Component 5
│ ├── Component 6
│ ├── Component 7
│ ├── Component 8
│ ├── Component 9
│ ├── Component 10
│ ├── Component 11
│ ├── Component 12
│ ├── Component 13
│ ├── Component 14
│ ├── Component 15
│ └── Component 16
└── Sub Assembly 3
├── Sub Sub Assembly 1
│ └── Component 1
├── Sub Sub Assembly 2
│ └── Component 1
├── Sub Sub Assembly 3
│ └── Component 1
├── Component 1
├── Component 2
├── Component 3
├── Component 4
├── Sub Sub Assembly 4
│ └── Component 1
├── Sub Sub Assembly 5
│ └── Component 1
├── Sub Sub Assembly 6
│ └── Component 1
├── Sub Sub Assembly 7
│ └── Component 1
├── Sub Sub Assembly 8
│ └── Component 1
├── Sub Sub Assembly 9
│ └── Component 1
└── Sub Sub Assembly 10
└── Component 1
Parse the TCL Script Output File
Looks something like this (~
delimited; contents pasted here so as to not include ExampleTree_featureRequest_raw.txt
):
~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component2
~Assembly1~SubAssembly2~Component1
~Assembly1~SubAssembly2~Component2
~Assembly1~SubAssembly2~Component3
~Assembly1~SubAssembly2~Component4
~Assembly1~SubAssembly2~Component5
~Assembly1~SubAssembly2~Component6
~Assembly1~SubAssembly2~Component7
~Assembly1~SubAssembly2~Component8
~Assembly1~SubAssembly2~Component9
~Assembly1~SubAssembly2~Component10
~Assembly1~SubAssembly2~Component11
~Assembly1~SubAssembly2~Component12
~Assembly1~SubAssembly2~Component13
~Assembly1~SubAssembly2~Component14
~Assembly1~SubAssembly2~Component15
~Assembly1~SubAssembly2~Component16
~Assembly1~SubAssembly3~SubSubAssembly1~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly3~SubSubAssembly2~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly1~Component1
~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly2~Component1
~Assembly1~SubAssembly3~SubSubAssembly4~Component1
~Assembly1~SubAssembly3~SubSubAssembly4~Component2
~Assembly1~SubAssembly3~SubSubAssembly4~Component3
~Assembly1~SubAssembly3~SubSubAssembly4~Component4
~Assembly1~SubAssembly3~SubSubAssembly4~Component1
~Assembly1~SubAssembly3~SubSubAssembly5~Component1
~Assembly1~SubAssembly3~SubSubAssembly6~Component1
~Assembly1~SubAssembly3~SubSubAssembly7~Component1
~Assembly1~SubAssembly3~SubSubAssembly8~Component1
~Assembly1~SubAssembly3~SubSubAssembly9~Component1
~Assembly1~SubAssembly3~SubSubAssembly10~Component1
# Output file from the TCL script
raw_tcl_output = Path().cwd().joinpath('ExampleTree_featureRequest_raw.txt')
Read in the raw TCL Output File
with open(raw_tcl_output, 'r') as fin:
content = fin.readlines()
Parse the content into list of lists
# Strip newlines, split into lists
content = [c.strip('\n') for c in content]
content = [c.split('~') for c in content]
display(content[0])
# Take all but the first after splitting to remove blank at beginning
# I guess that blank (i.e. ``content[0][0]``) is like the root?
content = [c[1:] for c in content]
['',
'Assembly1',
'SubAssembly1',
'SubSubAssembly1',
'SubSubSubAssembly1',
'Component1']
content[0]
['Assembly1',
'SubAssembly1',
'SubSubAssembly1',
'SubSubSubAssembly1',
'Component1']
content
[['Assembly1',
'SubAssembly1',
'SubSubAssembly1',
'SubSubSubAssembly1',
'Component1'],
['Assembly1',
'SubAssembly1',
'SubSubAssembly1',
'SubSubSubAssembly1',
'Component2'],
['Assembly1', 'SubAssembly2', 'Component1'],
['Assembly1', 'SubAssembly2', 'Component2'],
['Assembly1', 'SubAssembly2', 'Component3'],
['Assembly1', 'SubAssembly2', 'Component4'],
['Assembly1', 'SubAssembly2', 'Component5'],
['Assembly1', 'SubAssembly2', 'Component6'],
['Assembly1', 'SubAssembly2', 'Component7'],
['Assembly1', 'SubAssembly2', 'Component8'],
['Assembly1', 'SubAssembly2', 'Component9'],
['Assembly1', 'SubAssembly2', 'Component10'],
['Assembly1', 'SubAssembly2', 'Component11'],
['Assembly1', 'SubAssembly2', 'Component12'],
['Assembly1', 'SubAssembly2', 'Component13'],
['Assembly1', 'SubAssembly2', 'Component14'],
['Assembly1', 'SubAssembly2', 'Component15'],
['Assembly1', 'SubAssembly2', 'Component16'],
['Assembly1',
'SubAssembly3',
'SubSubAssembly1',
'SubSubSubAssembly1',
'Component1'],
['Assembly1',
'SubAssembly3',
'SubSubAssembly2',
'SubSubSubAssembly1',
'Component1'],
['Assembly1',
'SubAssembly3',
'SubSubAssembly3',
'SubSubSubAssembly1',
'Component1'],
['Assembly1',
'SubAssembly3',
'SubSubAssembly3',
'SubSubSubAssembly2',
'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component2'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component3'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component4'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly4', 'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly5', 'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly6', 'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly7', 'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly8', 'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly9', 'Component1'],
['Assembly1', 'SubAssembly3', 'SubSubAssembly10', 'Component1']]
Desired Feature
A way to 'batch create' a node tree. Some command that will take in a list of delimited node-childNodes-etc. and create a valid Node object from it.
Example:
>>> list1 = ['Assembly1', 'SubAssembly1', 'SubSubAssembly1', 'SubSubSubAssembly1', 'Component1']
>>> node_from_list1 = SomeNewBatchCreateNodeFunction(input_list=list1)
The above should produce the same result as doing it by hand:
a1 = Node('Assembly 1', parent=None)
a1_sa1 = Node('Sub Assembly 1', parent=a1)
a1_sa1_ssa1 = Node('Sub Sub Assembly 1', parent=a1_sa1)
a1_sa1_ssa1_sssa1 = Node('Sub Sub Sub Assembly 1', parent=a1_sa1_ssa1)
a1_sa1_ssa1_sssa1.children = [Node('Component 1')]
Related Issues
Open
- Issue 108 - csv importer/exporter with full path address
- Issue 74 - Can you join the features that merge two trees into one? please
- Issue 25 - Are there plans to include diff feature which compares two trees and points out the differences?
Closed
-
Issue 72 -
Node.path_to
method?- This might actually be what I'm looking for?
Semi-Related Issues
Open
Closed
-
Issue 75 - builder-like tree construction
- Semi-related because I essentially want to avoid having to manually nest the
children=[...]
stuff
- Semi-related because I essentially want to avoid having to manually nest the
-
Issue 30 - Plans to persist tree to file and create a tree from file?
- Semi-related because 'get the tree back from a file' is similar in concept to what I really need to do
You mean like this?
from anytree import Node, findall_by_attr, RenderTree
lines = ['~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component1',
'~Assembly1~SubAssembly1~SubSubAssembly1~SubSubSubAssembly1~Component2',
'~Assembly1~SubAssembly2~Component1',
'~Assembly1~SubAssembly2~Component2',
'~Assembly1~SubAssembly2~Component3',
'~Assembly1~SubAssembly2~Component4',
'~Assembly1~SubAssembly2~Component5',
'~Assembly1~SubAssembly2~Component6',
'~Assembly1~SubAssembly2~Component7',
'~Assembly1~SubAssembly2~Component8',
'~Assembly1~SubAssembly2~Component9',
'~Assembly1~SubAssembly2~Component10',
'~Assembly1~SubAssembly2~Component11',
'~Assembly1~SubAssembly2~Component12',
'~Assembly1~SubAssembly2~Component13',
'~Assembly1~SubAssembly2~Component14',
'~Assembly1~SubAssembly2~Component15',
'~Assembly1~SubAssembly2~Component16'
'~Assembly1~SubAssembly3~SubSubAssembly1~SubSubSubAssembly1~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly2~SubSubSubAssembly1~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly1~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly3~SubSubSubAssembly2~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly4~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly4~Component2',
'~Assembly1~SubAssembly3~SubSubAssembly4~Component3',
'~Assembly1~SubAssembly3~SubSubAssembly4~Component4',
'~Assembly1~SubAssembly3~SubSubAssembly4~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly5~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly6~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly7~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly8~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly9~Component1',
'~Assembly1~SubAssembly3~SubSubAssembly10~Component1', ]
def from_assembly_line(root: Node = None, line: str = ''):
nodenames = [x for x in line.split('~') if x] # removing empty items
# root node
if root is None:
root = Node(nodenames[0], parent=None)
# iterating from the second element
for nodeind, nodename in enumerate(nodenames[1:]):
parent_candidate = findall_by_attr(node=root, value=nodenames[nodeind])
# todo check len(parent_candidate) > 0
if not findall_by_attr(node=parent_candidate[0], value=nodename):
Node(nodename, parent=parent_candidate[0])
return root
if __name__ == '__main__':
_root = None
for _line in lines:
_root = from_assembly_line(root=_root, line=_line)
print(RenderTree(_root))
That looks like it'll work. I think I long ago found a work around to my issue above but I was hoping that this could become a more easily used feature in future releases. That way you don't have to write your own function to do it, even if it is a pretty simple (in hindsight) function.
Something more general that I would like would be a function to turn a list like this:
l = [('Europe", "Italy", "Rome"),
('Europe", "Italy", "Milan"),
('Europe", "France", "Paris")]
into a tree:
Europe
Italy
Rome
Milan
France
Paris
The reason is that I have many hierarchies stored in csv files and with pandas and such a function I can easily convert them to trees. It would also solve your problem, because you can just split each of your strings.
Here are sample implementations for my ideas above:
def from_rows(rows, node_factory=anytree.Node, root_name="root"):
created_nodes = {}
root = node_factory(root_name)
for row in rows:
parent_node = root
for depth, col in enumerate(row):
if (depth, col) in created_nodes:
node = created_nodes[depth, col]
else:
node = node_factory(col)
node.parent = parent_node
created_nodes[depth, col] = node
parent_node = node
return root
def to_rows(root, str_factory=str, skip_root=True):
index = 1 if skip_root else 0
for leaf in root.leaves:
yield [str_factory(node) for node in leaf.path[index:]]
Update 2024-01-06:
I implemented this in littletree using functions Node.from_rows
, Node.to_rows
.