trino icon indicating copy to clipboard operation
trino copied to clipboard

Support text type predicate pushdown with collation in MySQL

Open brandboat opened this issue 2 years ago • 5 comments

Description

related to https://github.com/trinodb/trino/issues/7496, support text type predicate pushdown in MySQL connector, this pr mainly did predicate pushdown, and didn't involved with top-n/aggregate/join text type pushdown.

This pr use utf8mb4_0900_bin which is supported in mysql 8.0.17+, and according to https://github.com/trinodb/trino/issues/7496#issuecomment-1231331703, utf8mb4_0900_bin is the one which is under utf8mb4 charset and doesn’t ignore trailing spaces. To enable this collation, we need to specify character_set_server in jdbc properties through MySqlConfig.

Non-technical explanation

Release notes

(x) This is not user-visible and no release notes are required. ( ) Release notes are required, please propose a release note for me. ( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

brandboat avatar Sep 06 '22 16:09 brandboat

gentle ping @hashhar, could you take a look ? Many thanks

brandboat avatar Sep 07 '22 10:09 brandboat

some concerns about the larger direction @brandboat.

We have experimental toggle in Postgresql already. It's experimental because it degrades performance for queries with equality conditions (because = without COLLATE can use index but with COLLATE won't be able to use index).

I'd prefer to follow the following steps:

  • Improve CollationAwareQueryBuilder so that it only applies COLLATE for inequalities (to not introduce performance degradation for equalities)
  • Remove experimental prefix from the config in PostgreSQL and enable it by default
  • Add config in MySQL enabled by default

cc: @findepi

I'm ok with adding this in MySQL directly but it means additional work in two places to fix the performance issue with equalities.

hashhar avatar Sep 20 '22 11:09 hashhar

@hashhar AFAIR MySQL and PostgreSQL default collations are different with respect to equalities and case sensitivity.

findepi avatar Sep 20 '22 21:09 findepi

I'd prefer to follow the following steps:

  • Improve CollationAwareQueryBuilder so that it only applies COLLATE for inequalities (to not introduce performance degradation for equalities)
  • Remove experimental prefix from the config in PostgreSQL and enable it by default
  • Add config in MySQL enabled by default

hashhar, many thanks for your comment, not quite sure if I understand what you mean but MySql default collation is case insensitive (not like pg), I'm not sure if we can handle equalities the same way we handled in postgresql.

It's experimental because it degrades performance for queries with equality conditions (because = without COLLATE can use index but with COLLATE won't be able to use index).

I've googled and found that might happened in mysql too. While here I mark this config (mysql.experimental.enable-string-pushdown-with-collate) as experimental, user can choose to turn it on/off depending on there scenario.

brandboat avatar Sep 21 '22 15:09 brandboat

BTW if we don't do collation push down for equalities, in some cases join push down won't take effect e.g. copy tpch table orders and lineitem in mysql and run select o_t.orderkey from (select * from orders where comment='foo') o_t left join lineitem l on o_t.orderkey = l.orderkey. left table in plan will be FilterNode(TableScanNode) and right table will be TableScanNode and that will cause we can't benefit from PushJoinIntoTableScan since the PATTERN in it requires both left/right table to be tableScan node.

brandboat avatar Sep 21 '22 17:09 brandboat

:wave: @brandboat and @hashhar - this PR has become inactive. We hope you are still interested in working on it. Please let us know, and we can try to get reviewers to help with that.

We're working on closing out old and inactive PRs, so if you're too busy or this has too many merge conflicts to be worth picking back up, we'll be making another pass to close it out in a few weeks.

mosabua avatar Jan 12 '24 21:01 mosabua