greptimedb
greptimedb copied to clipboard
Remove the unnecessary ser/de by substrait in standalone mode
What type of enhancement is this?
Performance
What does the enhancement do?
Currently the standalone use the same query engine just like distributed, which rewrites the logical plan and ser/de it by substrait. This is not required for standalone, and can bring some performance penalty.
Implementation challenges
No response
I'm negative about this ticket. This is what we chose not to do. It would make the code hard to maintain.
Notice we don't actually have a "standalone" for a long time. Only datanode, frontend and metasrv that are composed in different ways.
The "penalty" you are referring to should be small or doesn't exist -- at least at the query level. Even at the plan phase, where a sub-plan would be optimized again, the penalty is relatively small. Because those parts of plan are not optimized in frontend, so there is no thing like duplicate work. As for the substrait codec part, the overhead looks acceptable.
I'm not going to create a new query engine here, just trying to get rid of the unnecessary ser/de by substrait in standalone mode. It's like a ”surgical" refactor: just make standalone datanode manager called by LogicalPlan directly, without ser/de to bytes. Other things stay unchanged, including the query engine in region server. So, as to the optimization, the LogicalPlan still can be optimized in region server as always.
What I really want to achieve is to make "explain analyze" to work again, otherwise it's difficult to troubleshoot any query performance issues.
👍 That makes much sense.
As for the second part, there is a related issue https://github.com/GreptimeTeam/greptimedb/issues/2374. I may continue on it if @NiwakaDev is busy.