anytree icon indicating copy to clipboard operation
anytree copied to clipboard

Why IndentedStringImporter is gone ?

Open regexgit opened this issue 5 years ago • 8 comments

About one year ago I had IndentedStringImporter installed with anytree. Now after a reinstallation of the OS and all my tools I realize that it is no longer present.

I use it a lot, for example biological taxonomies or import/export of photo tools hierarchical keywords are "controlled vocabulary" indented text files.

Fortunately I kept a backup of the code but but I would prefer an official installation.

regexgit avatar Nov 05 '19 16:11 regexgit

It was no official implementation. Just on a branch. I will double check.

c0fec0de avatar Jan 10 '20 00:01 c0fec0de

Just to add to the convo here I'm also interested in seeing the IndentedStringImporter (perhaps also an IndentedStringExporter?) added in. I read the initial feature request and tried looking for the source code by my github-foo is not very good.

als0052 avatar Jan 14 '21 12:01 als0052

I was just thinking about implementing an indented string importer, something that would read:

Foo
  Bar
  Baz
    Boz
    Bitz
  Blah

...and construct a tree with just that.

If I implement such a thing in a branch, is there any chance that it would be accepted? Is the project taking contributions?

LionKimbro avatar Jan 23 '21 02:01 LionKimbro

Any updates on this?

angely-dev avatar Feb 14 '23 16:02 angely-dev

It seems not. If it can help you while waiting for an official version: I still use the original version (file indentedstringimporter.py of 2019) which I carefully kept. No warranty of course but for my needs it's enough.

# -*- coding: utf-8 -*-
from anytree import AnyNode

#---------------------------------------
def _get_indentation(line):
	# Split string using version without indentation
	# First item of result is the indentation itself.
	content = line.lstrip(' ')
	indentation_length = len(line.split(content)[0])
	return indentation_length, content

#*******************************************************************************
class IndentedStringImporter(object):

	def __init__(self, nodecls=AnyNode):
		u"""
		Import Tree from a single string (with all the lines) or list of strings
		(lines) with indentation.
		
		Every indented line is converted to an instance of `nodecls`. The string
		(without indentation) found on the lines are set as the respective node name.
		
		This importer do not constrain indented data to have a definite number of
		whitespaces (multiple of any number). Nodes are considered child of a
		parent simply if its indentation is bigger than its parent.
		
		This means that the tree can have siblings with different indentations,
		as long as the siblings indentations are bigger than the respective parent
		(but not necessarily the same considering each other).
		
		Keyword Args:
		    nodecls: class used for nodes.
		
		Example using a string list:
		>>> from anytree.importer import IndentedStringImporter
		>>> from anytree import RenderTree
		>>> importer = IndentedStringImporter()
		>>> lines = [
		...    'Node1',
		...    'Node2',
		...    '    Node3',
		...    'Node5',
		...    '    Node6',
		...    '        Node7',
		...    '    Node8',
		...    '        Node9',
		...    '      Node10',
		...    '    Node11',
		...    '  Node12',
		...    'Node13',
		...]
		>>> root = importer.import_(lines)
		>>> print(RenderTree(root))
		AnyNode(name='root')
		├── AnyNode(name='Node1')
		├── AnyNode(name='Node2')
		│   └── AnyNode(name='Node3')
		├── AnyNode(name='Node5')
		│   ├── AnyNode(name='Node6')
		│   │   └── AnyNode(name='Node7')
		│   ├── AnyNode(name='Node8')
		│   │   ├── AnyNode(name='Node9')
		│   │   └── AnyNode(name='Node10')
		│   ├── AnyNode(name='Node11')
		│   └── AnyNode(name='Node12')
		└── AnyNode(name='Node13')
		Example using a string:
		>>> string = "Node1\n  Node2\n  Node3\n    Node4"
		>>> root = importer.import_(string)
		>>> print(RenderTree(root))
		 AnyNode(name='root')
		└── AnyNode(name='Node1')
		    ├── AnyNode(name='Node2')
		    └── AnyNode(name='Node3')
		        └── AnyNode(name='Node4')
		"""
		
		self.nodecls = nodecls
	
	#------------------------------------
	def _tree_from_indented_str(self, data):
		if isinstance(data, str):
			lines = data.splitlines()
		else:
			lines = data
		root = self.nodecls(name="root")
		indentations = {}
		for line in lines:
			cur_indent, name = _get_indentation(line)

			if len(indentations) == 0:
				parent = root
			elif cur_indent not in indentations:
				# parent is the next lower indentation
				keys = [key for key in indentations.keys()
						  if key < cur_indent]
				parent = indentations[max(keys)]
			else:
				# current line uses the parent of the last line
				# with same indentation
				# and replaces it as the last line with this given indentation
				parent = indentations[cur_indent].parent

			indentations[cur_indent] = self.nodecls(name=name, parent=parent)

			# delete all higher indentations
			keys = [key for key in indentations.keys() if key > cur_indent]
			for key in keys:
				indentations.pop(key)
		return root
	
	#------------------------------------
	def import_(self, data):
		# data: single string or a list of lines
		return self._tree_from_indented_str(data)

regexgit avatar Feb 15 '23 09:02 regexgit

Thanks @regexgit for pointing out the original version, yet I ended up doing my own and lightweight implementation. It converts an indented config (not text, strictly speaking, since I assume each line to be unique per indented blocks) to an n-ary tree using raw nested dicts.

The goal was to compare (and merge) two config files whilst being aware of the indented blocks scope. Unlike anytree, it won't meet everyone's requirements but if anyone is interested: text to tree conversion in 10 lines of code and an example. I also published a simple gist.

angely-dev avatar Mar 01 '23 09:03 angely-dev

I would also be interested in this.

I actually created my own version. It wasn't written for anytree (but can probably easily be changed) and it may not be very flexible or fault-tolerant, but it should be reasonably fast for correct input:

    def from_indented_file(file, indent='@'):  # Change to "    " if 4 spaces are desired
        # Each line consists of indent and code
        pattern = re.compile(rf"^(?P<prefix>({re.escape(indent)})*)(?P<code>.*)")

        root = Node()
        stack = [root]

        for line in file:
            match = pattern.match(line)
            prefix, code = match['prefix'], match['code']
            depth = len(prefix) // len(indent)
            parent_node = stack[depth]
            node = parent_node.add(code)  # Should probably change to node = Node(parent=parent_node)

            # Place node as last item on index depth + 1
            del stack[depth + 1:]
            stack.append(node)

   return root

If a pull request is accepted, maybe the best parts of all three implementations can be combined. I would also like to have an export to an indented file with the same options.

lverweijen avatar Jul 04 '23 19:07 lverweijen