cloudberry icon indicating copy to clipboard operation
cloudberry copied to clipboard

Dead GUC enable_geqo in CBDB, but not in Postgres

Open avamingli opened this issue 1 year ago • 13 comments

Cloudberry Database version

main

What happened

Commit 9e6035c5fefcbcd94be16b28935e91ba7c4bc339 bring back enable_geqo, but the GUC is still not used in CBDB: https://github.com/search?q=repo%3Acloudberrydb%2Fcloudberrydb+enable_geqo&type=code

In Postgres, it surely be used in planner: https://github.com/search?q=repo%3Apostgres%2Fpostgres%20enable_geqo&type=code

Though it's put in codes, but still a dead GUC in codes and users can not set it when using CBDB. Why GPDB/CBDB doesn't use enable_geqo unlike Postgres, shall we use it in codes or remove it entirely if there is any consideration by GPDB?

What you think should happen instead

No response

How to reproduce

NULL

Operating System

All

Anything else

No response

Are you willing to submit PR?

  • [ ] Yes, I am willing to submit a PR!

Code of Conduct

avamingli avatar Oct 09 '24 02:10 avamingli

This commit https://github.com/cloudberrydb/cloudberrydb/commit/9e6035c5fefcbcd94be16b28935e91ba7c4bc339 describes very clear that it is only for extension compatibility sake.

roseduan avatar Oct 09 '24 09:10 roseduan

This commit 9e6035c describes very clear that it is only for extension compatibility sake.

Yeah, my question is

Why GPDB/CBDB doesn't use enable_geqo unlike Postgres, shall we use it in codes or remove it entirely if there is any consideration by GPDB?

avamingli avatar Oct 09 '24 09:10 avamingli

This commit 9e6035c describes very clear that it is only for extension compatibility sake.

Yeah, my question is

Why GPDB/CBDB doesn't use enable_geqo unlike Postgres, shall we use it in codes or remove it entirely if there is any consideration by GPDB?

+1. If the kernel doesn't know the GUC, it could be defined in the extension. I doubt if the GUC is used by two or more extensions.

gfphoenix78 avatar Oct 21 '24 03:10 gfphoenix78

@reshke If the GUC is used only by the extension, why not define the GUC in the extension?

gfphoenix78 avatar Oct 21 '24 03:10 gfphoenix78

Why GPDB/CBDB doesn't use enable_geqo unlike Postgres, shall we use it in codes or remove it entirely if there is any consideration by GPDB?

I dont know the reasons and im purely unaware of them. But this looks like very-very old decision made and early as Greenplum 5 or even earlier. I did not find any discussion nor reasons in opensource.

@reshke If the GUC is used only by the extension, why not define the GUC in the extension?

Because this way you need to modify extension to that this extension is compatible with CBDB. So, for every extension that uses enable_geqo GUC, and that you want to use with CBDB, you need to do some modification. In my opinion is in purely beneficial to avoid that.

reshke avatar Oct 21 '24 10:10 reshke

Why GPDB/CBDB doesn't use enable_geqo unlike Postgres, shall we use it in codes or remove it entirely if there is any consideration by GPDB?

I dont know the reasons and im purely unaware of them. But this looks like very-very old decision made and early as Greenplum 5 or even earlier. I did not find any discussion nor reasons in opensource.

@reshke If the GUC is used only by the extension, why not define the GUC in the extension?

Because this way you need to modify extension to that this extension is compatible with CBDB. So, for every extension that uses enable_geqo GUC, and that you want to use with CBDB, you need to do some modification. In my opinion is in purely beneficial to avoid that.

The common code are dead code in the kernel. Normally, the common code for extensions are general and interactive with kernel code, like frame work, hook functions etc., not just defined but unused. The one principal of the code conduct is to make kernel code clean. Let's make the adapter code in the extension, not in the kernel.

gfphoenix78 avatar Oct 21 '24 11:10 gfphoenix78

Why GPDB/CBDB doesn't use enable_geqo unlike Postgres, shall we use it in codes or remove it entirely if there is any consideration by GPDB?

I dont know the reasons and im purely unaware of them. But this looks like very-very old decision made and early as Greenplum 5 or even earlier. I did not find any discussion nor reasons in opensource.

@reshke If the GUC is used only by the extension, why not define the GUC in the extension?

Because this way you need to modify extension to that this extension is compatible with CBDB. So, for every extension that uses enable_geqo GUC, and that you want to use with CBDB, you need to do some modification. In my opinion is in purely beneficial to avoid that.

The common code are dead code in the kernel. Normally, the common code for extensions are general and interactive with kernel code, like frame work, hook functions etc., not just defined but unused. The one principal of the code conduct is to make kernel code clean. Let's make the adapter code in the extension, not in the kernel.

In cbdb enable_geqo value is always false, so absence of core geqo related parts is not an issue, when not used. enable_geqo just signals extension to avoid there usages. IMO it is very inconsistent decision (made by Pivotal or even earlier) to remove enable_geqo, leaving all other geqo-related gucs in place. Example: https://github.com/cloudberrydb/cloudberrydb/blob/main/src/backend/utils/misc/guc.c#L3780-L3789 Besides that, it is anyway more developer friendly to leave GUCS in kernel even with no use. Example: https://github.com/cloudberrydb/cloudberrydb/blob/main/src/backend/utils/misc/guc.c#L1219-L1228

reshke avatar Oct 21 '24 12:10 reshke

The geo related GUCs are ignored since Greenplum init commit, we don't know the concern behind that , and if it's still a concern now. I'm thinking of getting GUCs like enable_geqo back in planner codes.Because the search space could be very large if the we have many join tables, typically more than 10. And dynamic programming algorithm will be much slower than geo, that's why Postgres use it.

avamingli avatar Oct 21 '24 12:10 avamingli

I'm thinking of getting GUCs like enable_geqo back in planner codes.

totally agree

reshke avatar Oct 21 '24 12:10 reshke

I'm thinking of getting GUCs like enable_geqo back in planner codes.

totally agree

The challenges are: It will introduce plan diffs for ICW. And.. Genetic Algorithm is complex, I do have a little knowledge about that in my postgraduate dissertation, but I don't have the confidence to fix bugs quickly in Database planner codes.

avamingli avatar Oct 21 '24 12:10 avamingli

I do have a little knowledge about that in my postgraduate dissertation, but I don't have the confidence to fix bugs quickly in Database planner codes.

Thats ok, thats why enable_geqo GUC exists. If we default this GUC to false, any planner issue will be easily work-arounded. And also, if we catch some GEQO-related bugs, we can interact on issue using pgsql-hackers, huh? In cases, when bug is reproducible in pure postgres

reshke avatar Oct 21 '24 17:10 reshke

I have a question, so why we not call the DefineCustomBoolVariable (i mean define the enable_geqo in the guc.c) in CBDB, but only export it? :) @reshke

jiaqizho avatar Oct 23 '24 02:10 jiaqizho

I have a question, so why we not call the DefineCustomBoolVariable (i mean define the enable_geqo in the guc.c) in CBDB, but only export it? :) @reshke

Just define enable_geqo is most easy apply way.

yjhjstz avatar Dec 09 '24 08:12 yjhjstz