soda-sql icon indicating copy to clipboard operation
soda-sql copied to clipboard

REGEX use on a collated VARCHAR column in Snowflake causes error

Open scott-fought opened this issue 3 years ago • 1 comments

Describe the bug If a Snowflake VARCHAR column is defined with collation, REGEX functions cause an error.

To Reproduce Steps to reproduce the behavior:

  1. Create a VARCHAR column in a Snowflake table
  2. Run soda analyze ...
  3. Or write scan file with valid_regex entry for column
  4. Run soda scan ... using that scan file

Context Snowflake does not support REGEX on collated columns. Collation can be removed from a column by wrapping the expression in, COLLATE({expr}, '')

OS: Mac OS Big Sur version 11.6 Python Version: Python 3.9.10 Soda SQL Version: 2.1.2 Warehouse Type: Snowflake

scott-fought avatar Feb 01 '22 20:02 scott-fought

Fix is in the branch 665-snowflake-collation. Though this removes collation from all VARCHAR columns, collated or not. While this is benign, it is unnecessary. Would it be more appropriate to have a conditional or some other way to select this feature?

scott-fought avatar Feb 01 '22 20:02 scott-fought