anytree
anytree copied to clipboard
Why IndentedStringImporter is gone ?
About one year ago I had IndentedStringImporter installed with anytree. Now after a reinstallation of the OS and all my tools I realize that it is no longer present.
I use it a lot, for example biological taxonomies or import/export of photo tools hierarchical keywords are "controlled vocabulary" indented text files.
Fortunately I kept a backup of the code but but I would prefer an official installation.
It was no official implementation. Just on a branch. I will double check.
Just to add to the convo here I'm also interested in seeing the IndentedStringImporter (perhaps also an IndentedStringExporter?) added in. I read the initial feature request and tried looking for the source code by my github-foo is not very good.
I was just thinking about implementing an indented string importer, something that would read:
Foo
Bar
Baz
Boz
Bitz
Blah
...and construct a tree with just that.
If I implement such a thing in a branch, is there any chance that it would be accepted? Is the project taking contributions?
I've created a pull request with an implementation of the functionality, docstring documentation, and nose tests.
Any updates on this?
It seems not. If it can help you while waiting for an official version: I still use the original version (file indentedstringimporter.py of 2019) which I carefully kept. No warranty of course but for my needs it's enough.
# -*- coding: utf-8 -*-
from anytree import AnyNode
#---------------------------------------
def _get_indentation(line):
# Split string using version without indentation
# First item of result is the indentation itself.
content = line.lstrip(' ')
indentation_length = len(line.split(content)[0])
return indentation_length, content
#*******************************************************************************
class IndentedStringImporter(object):
def __init__(self, nodecls=AnyNode):
u"""
Import Tree from a single string (with all the lines) or list of strings
(lines) with indentation.
Every indented line is converted to an instance of `nodecls`. The string
(without indentation) found on the lines are set as the respective node name.
This importer do not constrain indented data to have a definite number of
whitespaces (multiple of any number). Nodes are considered child of a
parent simply if its indentation is bigger than its parent.
This means that the tree can have siblings with different indentations,
as long as the siblings indentations are bigger than the respective parent
(but not necessarily the same considering each other).
Keyword Args:
nodecls: class used for nodes.
Example using a string list:
>>> from anytree.importer import IndentedStringImporter
>>> from anytree import RenderTree
>>> importer = IndentedStringImporter()
>>> lines = [
... 'Node1',
... 'Node2',
... ' Node3',
... 'Node5',
... ' Node6',
... ' Node7',
... ' Node8',
... ' Node9',
... ' Node10',
... ' Node11',
... ' Node12',
... 'Node13',
...]
>>> root = importer.import_(lines)
>>> print(RenderTree(root))
AnyNode(name='root')
├── AnyNode(name='Node1')
├── AnyNode(name='Node2')
│ └── AnyNode(name='Node3')
├── AnyNode(name='Node5')
│ ├── AnyNode(name='Node6')
│ │ └── AnyNode(name='Node7')
│ ├── AnyNode(name='Node8')
│ │ ├── AnyNode(name='Node9')
│ │ └── AnyNode(name='Node10')
│ ├── AnyNode(name='Node11')
│ └── AnyNode(name='Node12')
└── AnyNode(name='Node13')
Example using a string:
>>> string = "Node1\n Node2\n Node3\n Node4"
>>> root = importer.import_(string)
>>> print(RenderTree(root))
AnyNode(name='root')
└── AnyNode(name='Node1')
├── AnyNode(name='Node2')
└── AnyNode(name='Node3')
└── AnyNode(name='Node4')
"""
self.nodecls = nodecls
#------------------------------------
def _tree_from_indented_str(self, data):
if isinstance(data, str):
lines = data.splitlines()
else:
lines = data
root = self.nodecls(name="root")
indentations = {}
for line in lines:
cur_indent, name = _get_indentation(line)
if len(indentations) == 0:
parent = root
elif cur_indent not in indentations:
# parent is the next lower indentation
keys = [key for key in indentations.keys()
if key < cur_indent]
parent = indentations[max(keys)]
else:
# current line uses the parent of the last line
# with same indentation
# and replaces it as the last line with this given indentation
parent = indentations[cur_indent].parent
indentations[cur_indent] = self.nodecls(name=name, parent=parent)
# delete all higher indentations
keys = [key for key in indentations.keys() if key > cur_indent]
for key in keys:
indentations.pop(key)
return root
#------------------------------------
def import_(self, data):
# data: single string or a list of lines
return self._tree_from_indented_str(data)
Thanks @regexgit for pointing out the original version, yet I ended up doing my own and lightweight implementation. It converts an indented config (not text, strictly speaking, since I assume each line to be unique per indented blocks) to an n-ary tree using raw nested dicts.
The goal was to compare (and merge) two config files whilst being aware of the indented blocks scope. Unlike anytree, it won't meet everyone's requirements but if anyone is interested: text to tree conversion in 10 lines of code and an example. I also published a simple gist.
I would also be interested in this.
I actually created my own version. It wasn't written for anytree (but can probably easily be changed) and it may not be very flexible or fault-tolerant, but it should be reasonably fast for correct input:
def from_indented_file(file, indent='@'): # Change to " " if 4 spaces are desired
# Each line consists of indent and code
pattern = re.compile(rf"^(?P<prefix>({re.escape(indent)})*)(?P<code>.*)")
root = Node()
stack = [root]
for line in file:
match = pattern.match(line)
prefix, code = match['prefix'], match['code']
depth = len(prefix) // len(indent)
parent_node = stack[depth]
node = parent_node.add(code) # Should probably change to node = Node(parent=parent_node)
# Place node as last item on index depth + 1
del stack[depth + 1:]
stack.append(node)
return root
If a pull request is accepted, maybe the best parts of all three implementations can be combined. I would also like to have an export to an indented file with the same options.