[Feature] Remove log_comment from the cache key
Is your feature request related to a problem? Please describe.
I want to have metadata in system.query_log to know more about the queries being executed. For that I'm using the log_comment setting in my queries.
As it is a comment, it does not affect my query computation nor its result.
Describe the solution you'd like
In the cache key (the query), remove the log_comment setting if it exists.
Describe alternatives you've considered None.
Additional context
I have a draft PR ready to fix this issue with a regex.
As discussed offline, there is also the possibility to allow for setting a custom cache key on the client side. This does introduce a lot of complexity on the client side (the client needs to be aware of the query structure to make sure queries that are the same besides the log_comment are cached properly).
But using a regex might introduce subtle bugs. It only affects the cache key, and I have some test cases to cover it, but it is hard to be exhaustive here. We can make this an optional step in the cache key query parser, and introduce a setting so that it isn't enabled by default.
@mga-chka @sigua-cs @gontarzpawel if you can also take a look. If you have a strong preference for one or the other solution...
I didn't looked at the internal discussion but IMHO using a regexp is a bad idea since it will slow down the processing for an edge case. If you look at the code, chproxy already remove the "normal" comments from the hash avoiding regexps so we should at least have a similar mechanism https://github.com/ContentSquare/chproxy/blob/859e0c38065d0bcab821a0a58e9a974030dae5b4/utils.go#L189