rv32emu
                                
                                 rv32emu copied to clipboard
                                
                                    rv32emu copied to clipboard
                            
                            
                            
                        Improve red-black tree implementation in map
map.[ch] is the C Implementation for C++ std::map using red-black tree. It works well, but there is room for improvements. We can take some tricks used in Red-black Trees in Linux for potential speedup. In addition, current rbtree can benefit from the introduction of indirect pointers.
commit 7fb4ff754e48f09e21f0e21137fd28ce59361de5 integrates color into lowest bit of parent pointer.
The expected improvements can be the followings:
- 
Combining the color bit with the parent node pointer This is done by alignment, since map_node_tis defined to align withunsigned long(8 Bytes). The least three bits would be left unused, so we can integrate the color bit to the LSB.
- 
Indirect pointers Indirect pointers can be used to decrease the number of branches to increase better performance in pipelines. 
- 
Embed map_node_tinto structures for better cache locality Since the pointer we use to manipulate are embedded in the structure variable we are using, there would be one less layer of indirection and resulting to better cache locality.
- Indirect pointers Indirect pointers can be used to decrease the number of branches to increase better performance in pipelines.
commit ff46dad7571d3d62852ca4b8722804c8324835bb and commit 09f4e653627ebb2bec188fff504e95b91a42fac9 lower the number of branches in red-black tree.
rbi is non-recursive, and has very little memory overhead (e.g. no parent pointer within the red-black tree node). Its features:
- 3 tree implementations: unordered; ordered but unbalanced; balanced
- memory allocation is done outside the library
- lookup, iteration and insertion is without recursion and with constant memory usage
- compact memory representation: parent pointers are not used
- compact memory representation: option to store the red-black bit of the balanced implementation in the least significant bit of the (right) pointer
- operations other than lookup, insertion and iteration are not implemented
We might learn from rbi.
Below is my proposal for the first change.
- store the red-black bit of the balanced implementation in the least significant bit of the (right) pointer
- Removing parent pointes and rewrite corresponding functions
I will also write testing to ensure the correctness of all functions.
What do you think?
The following result is recorded by the simple benchmark I created. By inserting, searching, and deleting 10e6 random and not duplicated integers.
| action type | map | rb | 
|---|---|---|
| insert (sec) | 1.123425 | 0.446589 | 
| search (sec) | 1.000282 | 0.406894 | 
| delete (sec) | 0.173655 | 0.44963 | 
The following result is recorded by the simple benchmark I created. By inserting, searching, and deleting 10e6 random and not duplicated integers.
You shall provide the corresponding benchmark programs for reference purpose.
In particular, rbi.c lacks of the ability to remove. rb.h is exactly used for comparison.
You shall provide the corresponding benchmark programs for reference purpose.
Repository: rbtree_bench
- [x] 1. mention the machine specifications which you performed the experiments on.
- [x] 2. provide a top-level CMakeLists.txt that describes the way to build both map and rb;
- [x] 3. pull Google Tests automatically. See Quickstart: Building with CMake; and
- [x] 4. rename the repository as rbtree-bench. There is no separation between rb and tree.
- [x] 2. provide a top-level CMakeLists.txt that describes the way to build both map and rb;
- [x] 3. pull Google Tests automatically. See Quickstart: Building with CMake; and
The following changes are required.
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,4 +1,4 @@
-cmake_minimum_required (VERSION 2.8.11)
+cmake_minimum_required (VERSION 3.14)
 
 project (rbtree_bench)
 
@@ -10,6 +10,7 @@ FetchContent_Declare(
   googletest
   URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip
 )
+FetchContent_MakeAvailable(googletest)
 
 add_subdirectory (map)
 add_subdirectory (rb)
diff --git a/map/CMakeLists.txt b/map/CMakeLists.txt
index 2701980..a67cbb5 100644
--- a/map/CMakeLists.txt
+++ b/map/CMakeLists.txt
@@ -1,4 +1,3 @@
-cmake_minimum_required(VERSION 3.5)
 project(map)
 
 enable_testing()
diff --git a/rb/CMakeLists.txt b/rb/CMakeLists.txt
index 06ab4d7..8bdf88d 100644
--- a/rb/CMakeLists.txt
+++ b/rb/CMakeLists.txt
@@ -1,10 +1,7 @@
-cmake_minimum_required(VERSION 3.5)
 project(rbi)
 
 enable_testing()
 
-find_package(GTest CONFIG REQUIRED)
-
 set (CMAKE_CXX_STANDARD 11)
 set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
 set(GCC_FLAGS "-s -Os -W -Wall -Werror")
The following result is recorded by the simple benchmark I created. By inserting, searching, and deleting 10e6 random and not duplicated integers.
Tested on eMag 8180, 64-bit 32-core Arm server.
| action type | map | rb | 
|---|---|---|
| insert (sec) | 2.341151 | 1.11663 | 
| search (sec) | 1.723217 | 0.851424 | 
| remove (sec) | 0.352333 | 1.13698 | 
Action items for rbtree_bench:
- [x] Unify the interface for accessing different rbtree implementations. You may wrap rb.h to fit the functions exposed by rv32emu's map.h.
- [x] Shrink rb.h by replacing it with much smaller one I provided. --> Done by commit
- [x] Utilize minibench for benchmarking -- simpler and more informative. --> Done by commit1 and commit2
- [x] Check existing rbtree benchmark programs and see if we can learn from the test scenarios. e.g., rbt_test.c from bind9 project and rb-bench.
- [x] Not only average time but also worst-case behavior should be considered.
- [x] Measure memory consumptions -- check Massif from Valgrind.
Close via commit 434c466