pydatastructs icon indicating copy to clipboard operation
pydatastructs copied to clipboard

Add m-ary trees

Open czgdp1807 opened this issue 5 years ago • 28 comments

Description of the problem

Adding m-ary trees will be a nice idea. May be a array based(allows printing the trees) or pointer(better space utilisation) or both would be better.

Example of the problem

References/Other comments

.. [1] https://en.wikipedia.org/wiki/M-ary_tree

czgdp1807 avatar Feb 26 '20 05:02 czgdp1807

Hi, can I start working on this issue as a part of RGSoC20?

vibhu18116 avatar Mar 02 '20 17:03 vibhu18116

Hi, @vibhu18116 Please start by designing an API for m-ary trees. Please let us know if you face any problem/queries.

czgdp1807 avatar Mar 02 '20 18:03 czgdp1807

Thank you for allowing me to work on this.

I believe the API design would closely follow that of binary trees. Hence these are a few APIs I think should form a basis:

  • __new__(cls, m_value = 2, array_flag = 0, key=None, root_data=None, comp=None)

This would initialise a new m-ary tree. m_value will be used to determine the maximum value of m (the number of children of a node), default to 2 (binary tree in that case). array_flag can be set to indicate that an array implementation is required and it can remain 0 for pointer implementation.

Rest of the parameters will be same as binary tree implementation.

  • insert(self, key, data)
  • delete(self, key, **kwargs)
  • search(self, key, **kwargs)
  • __str__(self)

The above four will behave in the same way as in the binary tree. Implementation of the __str__ function will vary depending upon if the tree is implemented through array or pointers.

  • height(self) Will return the height of the tree.

  • width(self) Will return the width of the tree.

  • convert_to_binary(self, array_flag = 0) This will convert an m-ary tree to binary tree and return the pointer to root if array_flag is not set and the array if array_flag is set.

Kindly check if these APIs work.

vibhu18116 avatar Mar 02 '20 19:03 vibhu18116

__new__(cls, m_value = 2, array_flag = 0, key=None, root_data=None, comp=None)

IMO, using only array based implementation will be better here as pointers are somewhat unsafe, so avoid them where ever possible. Instead of m_value, a better name would be max_children. This suggests a nice way to implement using arrays.

Can you please explain the use of the values returned by height and width methods?

convert_to_binary(self, array_flag = 0)

IMO, just to_binary_tree(self) would better.

You may refer, https://github.com/codezonediitj/pydatastructs/blob/master/pydatastructs/trees/binary_trees.py while implementing the above APIs.

czgdp1807 avatar Mar 03 '20 09:03 czgdp1807

Note that m-ary tree is not a search tree, so we don't need to worry about the manner in which nodes are stored. We need a creative way to avoid the exponential space complexity of array based implementation. May be a new type of node inheriting TreeNode can be made which will be having an array storing the indices of it's children node in the array of m-ary trees. Please let me know if you are unable to get my point.

czgdp1807 avatar Mar 03 '20 09:03 czgdp1807

I thought height and width of a tree can provide additional functionality to the user, however other than from feature point of view, I don't find any suitable use case. So, if you suggest I would drop them.

I would update the names of functions as suggested.

vibhu18116 avatar Mar 04 '20 17:03 vibhu18116

You may refer, https://github.com/codezonediitj/pydatastructs/blob/master/pydatastructs/trees/binary_trees.py while implementing the above APIs.

I checked this implementation, but a lot of functions like insert, delete are not implemented, probably because they depend on comparision key. Since the insert, delete API would follow the same pattern, should I first implement them?

vibhu18116 avatar Mar 04 '20 18:03 vibhu18116

Actually, binary trees are quite abstract, hence their insert and delete operations are left unimplemented. What logic should be followed while making insertions to a binary tree? Both the operations are well defined for binary search trees. IMO, we can keep insert, delete operations undefined for m-ary trees. We can a complementary m-ary search tree inheriting m-ary tree with insert and delete well defined for them. Here is defined for these kind of trees. You can do the work in two separate PRs for m-ary and m-ary search trees, one after the previous one.

czgdp1807 avatar Mar 05 '20 08:03 czgdp1807

I get your point. I will try to come up with actual implementation and try to create a PR in a day or two for m-ary tree.

vibhu18116 avatar Mar 10 '20 09:03 vibhu18116

Can I keep to_binary_tree method abstract in m-ary tree, or a concrete implementation is required?

If a concrete implementation is required, then should I modify the m-ary tree itself or create a new binary tree with nodes of type TreeNode?

Currently, I have made a new node type extending TreeNode as suggested earlier which form nodes of m-ary tree.

vibhu18116 avatar Mar 11 '20 19:03 vibhu18116

Can I keep to_binary_tree method abstract in m-ary tree, or a concrete implementation is required?

We can keep it as abstraction in m-ary tree but a concrete implementation can be added in m-ary search tree.

If a concrete implementation is required, then should I modify the m-ary tree itself or create a new binary tree with nodes of type TreeNode?

IMO, a new tree should returned.

czgdp1807 avatar Mar 11 '20 19:03 czgdp1807

I created a PR. However, the Travis checks failed due to some trailing whitespaces. Can you please guide how do I ensure there are no trailing white spaces?

Also can you review the code once to see if it is the desired implementation?

vibhu18116 avatar Mar 12 '20 06:03 vibhu18116

It's time to make a PR for MArySearchTree and the relevant tests.

czgdp1807 avatar Mar 18 '20 19:03 czgdp1807

Okay. I will start work on it soon :)

vibhu18116 avatar Mar 18 '20 19:03 vibhu18116

And I hope you are working on your RGSoC application. The deadline is 30th March according to the latest information I have. See, https://railsgirlssummerofcode.org/students/application/#apply

czgdp1807 avatar Mar 18 '20 20:03 czgdp1807

Yes. I will start working on application as well.

vibhu18116 avatar Mar 18 '20 20:03 vibhu18116

It's time to make a PR for MArySearchTree and the relevant tests.

I was looking through MArySearchTree on the link you referenced earlier.

However, I don't see a determinate way of insertion.

Say, if the nodes can have 4 children, and root has currently 3 only as a data item, then on trying to insert 5, it can become a part of the root, or go in a child node to the right of 3.

Can you clarify this point once?

vibhu18116 avatar Apr 07 '20 17:04 vibhu18116

I see, the current implementation in m_ary_trees.py doesn't align with the description given here. It should contain a group of keys but it just contains a single key. A minor update is needed on it.

I was looking through MArySearchTree on the link you referenced earlier.

However, I don't see a determinate way of insertion.

Some research is needed.

czgdp1807 avatar Apr 08 '20 13:04 czgdp1807

I see, the current implementation in m_ary_trees.py doesn't align with the description given here. It should contain a group of keys but it just contains a single key. A minor update is needed on it.

I think adding a method to insert a value in the node in MAryTreeNode should work. The value will be added if currently the number of values in the node are less than (max_children - 1), else, error can be thrown.

Does this work?

vibhu18116 avatar Apr 10 '20 16:04 vibhu18116

I think adding a method to insert a value in the node in MAryTreeNode should work.

How it will help in having multiple keys in the node?

czgdp1807 avatar Apr 11 '20 06:04 czgdp1807

So, the node can store the (data, key) as a tuple in a dynamically sized array which can grow up to (m-1) size where m is the maximum number of children a node can have.

The information about the children nodes is already stored in the children array.

vibhu18116 avatar Apr 12 '20 13:04 vibhu18116

Here's what I have thought of,

MAryTreeNode

  1. key_list - A SinglyLinkedList with key, data in LinkedListNode. The framework for linked lists is in linear_data_structures. We can keep array as well but for consistency, tuple might not be a good idea instead, I'd suggest to factor out the key and data part to the abstract node class as almost every node has these two members and then just use the abstract Node class as common type for array of keys.
  2. children - An array of child indices with i-th one pointing to a node whose keys are between key_list[i] and key_list[i-1].

The children array is already there so only 1 needs to be added.

czgdp1807 avatar Apr 12 '20 19:04 czgdp1807

This works. But the array could have been subjected to binary search to search a particular element in that node. However, this won't be possible in the linked list implementation. So, what is your opinion about this?

vibhu18116 avatar Apr 15 '20 17:04 vibhu18116

But the array could have been subjected to binary search to search a particular element in that node.

Well, the maximum size of the key_list won't be very large so, O(log(m)) or O(m) will not make much of a difference. Over optimisation can be avoided here. If m is large then MAryTree will not be of any practical use.

czgdp1807 avatar Apr 16 '20 11:04 czgdp1807

If m is large then MAryTree will not be of any practical use.

Though if you are aware of any use case where m is very large then we can think of using arrays.

czgdp1807 avatar Apr 16 '20 11:04 czgdp1807

I didn't find any specifics about the node size in different applications. So, if you say, I can proceed with a linked list.

vibhu18116 avatar Apr 22 '20 08:04 vibhu18116

Sure, feel free to proceed.

czgdp1807 avatar Apr 22 '20 09:04 czgdp1807

@czgdp1807 I hope this tree has already been implemented why not close this one!

Arvind-raj06 avatar Feb 17 '21 15:02 Arvind-raj06