skywalking
skywalking copied to clipboard
[Feature] A lightweight and APM-oriented SQL parser module
Search before asking
- [X] I had searched in the issues and found no similar feature requirement.
Description
For years, we have supported database monitoring from traces, including metrics analysis and slow SQLs sampling. But still, the metrics are very general.
We can't tell the hot tables and read/write loads like we recently added for visual cache analysis, https://github.com/apache/skywalking/pull/9622
We tried to use Apache ShardingSphere to parse SQL to get the key information such as table names, operation(select/update/insert/delete/DDL), and key conditions, it is possible from the tech perspective, but not fast enough as SkyWalking OAP should handle 1m+ spans/s at least.
Here, after months of thinking, I want to propose that SkyWalking begins to provide a new module to implement SQL parser by ourselves. Like all works we did, there is no intent to compete with ShardingSphere SQL parser but orient APM scenarios.
What does APM-oriented mean?
We should not try to build a complete SQL grammar tree like a SQL database, because generally, we just want to fetch important information from to SQLs, and build read/write metrics, tables in SQLs(but not columns), key conditions(such as table join conditions).
What tech stack should we choose?
Antlr is still our first class choice, as
- It is stable, and widely used for years
- SkyWalking OAL engine is built on the top of it, we could have a consistent tech stack.
We should only consider building our own grammar tree analysis stack when antlr can't provide enough performance(highly not likely)
About SQL analysis
There are some antlr grammar definitions for SQLs in ShardingSphere and Antlr repositories, and also out there from various blogs. But we should build our own rather than copy/paste from them(not about the license).
We need a simplified grammar tree for SQL(SQL92/MySQL/PostgreSQL, etc.), to only fetch important information
- Tables
- Join conditions
- Operation type, DML(select/insert/update/delete) or DDL(a general type, not a performance concern).
- Hard code conditions rather than preparedStatement detecting
There is a similar feature from uptrace SQL parser. Their footprint idea is to fetch important information too.
Use case
No response
Related issues
Last time PoC to do SQL parser through ShardingSphere kernel
- https://github.com/apache/skywalking/issues/5838
- https://github.com/apache/shardingsphere/issues/8208
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I shared this idea with @Superskyyy and @yswdqz They both think they have interests to do this later.
Let's keep open discussion here, especially about perf testing and SQL key information fetch and build, we need more discussions before providing this officially.
About integrating this module into the agent-analyzer and adding new metrics/dashboards, we could wait for this module is ready.
@yswdqz As you are starting this, let's try to build a basic anlt4 grammar file for SQLs to get fingerprint information, and test its performance(using JMH) first. Don't go too far about how OAP should work and which metrics could be generated.
@yswdqz As you are starting this, let's try to build a basic anlt4 grammar file for SQLs to get fingerprint information, and test its performance(using JMH) first. Don't go too far about how OAP should work and which metrics could be generated.
Got it.
@yswdqz Any update? Are you back to this?
@yswdqz Any update? Are you back to this?
Yes, I have wrote a demo, but I'm not satisfied with the performance, I'm trying to adjust it now.
Do you have ideas to go further? If you need any help, let me know.
Do you have ideas to go further?
I am reading some blogs about how to enhance performance, and I am trying to optimize the rule of parser.
If you need any help, let me know.
I don't know how to locate which is slow, is there any third-party software that can count the run time of each function?
Yes, there are many Java profiling tool, like JProfiler. It could measure all methods with stack context.
Yes, there are many Java profiling tool, like JProfiler. It could measure all methods with stack context.
Got it , I will take a try.
I have tested many ways, I'm sorry to say that for the current performance, it is difficult to meet APM's parsing needs for sql.
I've thought about optimizing performance by simplifying parser rules, skipping unneeded parts, and using antlr4 actions. But there is no order of magnitude improvement.
This is the performance report of my parser : performance report
This is the design report of sql-sign: sql-sign
This is the repository of my parser: sql-parser Notice: it has 2 modes: with action and with visitor. With action mode can parse most dml statement. But visitor mode I only implement select and delete statement (because of after finish these, I already found that the problem of performance, so I have not completed it)
To test them in APM , we can use test file. It should be put in server-starter module's test folder. And change VirtualDatabaseProcessor#prepareVSIfNecessary as VirtualDatabaseProcessor.java . Then you can run benchmark to test the performance changes before and after code changes.
Finally, as mentioned in the performance report, The main performance problems are in the lexer analysis and parser analysis sections. So it's likely that future performance improvements will be in the parser and lexer. It might be a good idea not to use antlr4 (or other util) to implement the parser, but that would be too much work. In the future I will focus on solutions from other open source frameworks and try to solve this performance problem.
Thanks for your summary. Let's suspend from here, until we have a better way, which costs a reasonable resources for analyzing this.