cube icon indicating copy to clipboard operation
cube copied to clipboard

Clickhouse string ordering and string filtering by UTF8 instead of bytes

Open casab opened this issue 1 year ago β€’ 3 comments

Clickhouse defaults to using bytes to order by and string manipulation functions such as lower, upper uses ascii. To overcome this limitation they have COLLATE keyword, and lowerUTF8, upperUTF8 functions.

Check List

  • [X] Tests has been run in packages where changes made if available
  • [X] Linter has been run for changed code
  • [ ] Tests for the changes have been added if not covered yet
  • [X] Docs have been added / updated if required

Description of Changes Made (if issue reference is not provided)

  • Replaced CONCAT SQL function with js template literal to prevent unnecessary DB function call
  • Used lowerUTF8 instead of lower to support utf8 compatible search
  • Added COLLATE β€˜en’ when ordering by strings to order incasesensitive. Clickhouse orders by bytes on default.

casab avatar Feb 09 '23 14:02 casab