cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

Refactor ORCA code to support Access Method

Open my-ship-it opened this issue 2 years ago • 7 comments

my-ship-it avatar Jul 24 '23 08:07 my-ship-it

https://github.com/orgs/cloudberrydb/discussions/113 Partially implemented.

wfnuser avatar Jul 25 '23 06:07 wfnuser

The spiking work should include at least the following parts:

  1. Understanding how user-defined indexes work in PostgreSQL, with a focus on how amcostestimate function in amroutine works.
  2. Investigating all the modules in ORCA that need modification to add a new type of index and understanding how to make those changes - Debugging hashindex can be used to gain insights.
  3. Considering the adaptations required for ORCA to support newly added indexes in the form of extensions and contemplating the interface for plugin developers. Can the current approach of amroutine be reused? How to interact with ORCA?

wfnuser avatar Jul 26 '23 03:07 wfnuser

After hacking for a while, we now have a hash index plan and the results seem to be correct. However, there is still a significant amount of work ahead. Our next step is to determine how the cost model should be implemented.

postgres=# explain SELECT * FROM hash_i4_heap WHERE hash_i4_heap.random = 843938989;
                                       QUERY PLAN
----------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..6.00 rows=1 width=8)
   ->  Index Scan using hash_i4_index on hash_i4_heap  (cost=0.00..6.00 rows=1 width=8)
         Index Cond: (random = 843938989)
 Optimizer: Pivotal Optimizer (GPORCA)
(4 rows)

wfnuser avatar Jul 26 '23 07:07 wfnuser

Hash index is a native index supported by PostgreSQL, and its implementation should be directly integrated into the source code. On the other hand, custom index implementations should be kept external and called through polymorphism, with specific interfaces to be designed accordingly.

Therefore, I intend to begin by implementing support for the hash index using the existing methods in ORCA. Once I become more acquainted with the relevant code, let's proceed with designing the refactor accordingly.

wfnuser avatar Jul 26 '23 07:07 wfnuser

Made a surprising discovery that gporca's support for indexes is actually very limited. The cost model for BRIN and GIN indexes is exactly the same, as both utilize the same computation method as the bitmap index.

wfnuser avatar Jul 27 '23 11:07 wfnuser

The communication between postgres and orca is a little bit harder than I thought. At first, I just want to attach a new field in amapi.h, just like amcostestimate. The advantage of this is I don't need to handle how to fetch datum from metadata and gporca already build the infrastructure to translate Relation to CMDIndexInfo, I can just modify it to make it work. However, I met incomplete type problems during the experiment. I cannot pass the c structure to cpp without including so many unnecessary header files.

My new plan is to create a new type of datum. Fetch and translate in GPORCA without amapi.h. From my research, I need to provide a wrapper for this datum, so that c and cpp can work together.

https://github.com/orgs/cloudberrydb/discussions/113

wfnuser avatar Aug 01 '23 11:08 wfnuser

Thanks @my-ship-it for the code review and guidance. I'm currently in the midst of transitioning out and may also be taking a break for a while. Nevertheless, if anyone is interested, feel free to reach out to me for discussions. Once I have more time in the future, I might continue exploring this issue in my spare time.

wfnuser avatar Aug 23 '23 10:08 wfnuser