soda-sql icon indicating copy to clipboard operation
soda-sql copied to clipboard

`soda analyze` fails on first run against postgresql

Open sivy opened this issue 2 years ago • 5 comments

Describe the bug I tried to setup a new postgres warehouse with soda-sql cli and then run soda analyze against it.

To Reproduce Steps to reproduce the behavior:

  1. python3 -m venv .venv-soda
  2. source .venv-soda/bin/activate
  3. pip install soda-sql-postgresql soda-sql-snowflake
  4. soda create postgres -d thedb -u theuser
  5. Update ~/.soda/env_vars.yml with credentials
  6. soda analyze

Context

PostgreSQL version: 10

warehouse.yml:

name: postgres
connection:
  type: postgres
  host: the_host
  port: '5432'
  username: env_var(POSTGRES_USERNAME)
  password: env_var(POSTGRES_PASSWORD)
  database: thedb
  schema: public

Log:

% soda soda analyze                                                        
  | 2.1.0b18
  | Analyzing warehouse.yml ...
  | Querying warehouse for tables
  | Directory tables already exists
  | Executing SQL query: 
SELECT table_name 
FROM information_schema.tables 
WHERE lower(table_schema)='public'
  | SQL took 0:00:00.668961
  | Executing SQL query: 
SELECT column_name, data_type, is_nullable 
FROM information_schema.columns 
WHERE lower(table_name) = 'dual' 
  AND table_catalog = 'pmx' 
  AND table_schema = 'public'
  | SQL took 0:00:00.377569
  | Executing SQL query: 
SELECT 
  COUNT(CASE WHEN "dummy" ~* '^\-?[0-9]+$' THEN 1 END) AS number_whole,
  COUNT(CASE WHEN "dummy" ~* '^\-?[0-9]+\.[0-9]+$' THEN 1 END) AS number_decimal_point,
  COUNT(CASE WHEN "dummy" ~* '^\-?[0-9]+,[0-9]+$' THEN 1 END) AS number_decimal_comma,
  COUNT(CASE WHEN "dummy" ~* '^\-?\d+([\.,]\d+)? ?%$' THEN 1 END) AS number_percentage,
  COUNT(CASE WHEN "dummy" ~* '^\-?[0-9]+([\.][0-9]+)? ?%$' THEN 1 END) AS number_percentage_point,
  COUNT(CASE WHEN "dummy" ~* '^\-?[0-9]+([,][0-9]+)? ?%$' THEN 1 END) AS number_percentage_comma,
  COUNT(CASE WHEN "dummy" ~* '^(\$)? ?(\-)?([0-9]+[,])*([0-9]+)(\.[0-9]+)? ?(\$|usd|USD)?$' THEN 1 END) AS number_money_usd,
  COUNT(CASE WHEN "dummy" ~* '^(€)? ?(\-)?([0-9]+[ ])*([0-9]+)(,[0-9]+)? ?(€|eur|EUR)?$' THEN 1 END) AS number_money_eur,
  COUNT(CASE WHEN "dummy" ~* '^(£)? ?(\-)?([0-9]+[,])*([0-9]+)(\.[0-9]+)? ?(£|gbp|GBP)?$' THEN 1 END) AS number_money_gbp,
  COUNT(CASE WHEN "dummy" ~* '^(¥)? ?(\-)?([0-9]+[,])*([0-9]+)(\.[0-9]+)? ?(¥|rmb|RMB)?$' THEN 1 END) AS number_money_rmb,
  COUNT(CASE WHEN "dummy" ~* '^(CHf)? ?(\-)?([0-9]+[''])*([0-9]+)(\.[0-9]+)? ?(CHf|chf|CHF)?$' THEN 1 END) AS number_money_chf,
  COUNT(CASE WHEN "dummy" ~* '^(((\$)? ?(\-)?([0-9]+[,])*([0-9]+)(\.[0-9]+)? ?(\$|usd|USD)?)|((€)? ?(\-)?([0-9]+[ ])*([0-9]+)(,[0-9]+)? ?(€|eur|EUR)?)|((£)? ?(\-)?([0-9]+[,])*([0-9]+)(\.[0-9]+)? ?(£|gbp|GBP)?)|((¥)? ?(\-)?([0-9]+[,])*([0-9]+)(\.[0-9]+)? ?(¥|rmb|RMB)?)|((CHf)? ?(\-)?([0-9]+[''])*([0-9]+)(\.[0-9]+)? ?(CHf|chf|CHF)?))$' THEN 1 END) AS number_money,
  COUNT(CASE WHEN "dummy" ~* '^([1-9]|0[1-9]|[12][0-9]|3[01])[-\./]([1-9]|0[1-9]|1[012])[-\./](19|20)?[0-9][0-9]' THEN 1 END) AS date_eu,
  COUNT(CASE WHEN "dummy" ~* '^([1-9]|0[1-9]|1[012])[-\./]([1-9]|0[1-9]|[12][0-9]|3[01])[-\./](19|20)?[0-9][0-9]' THEN 1 END) AS date_us,
  COUNT(CASE WHEN "dummy" ~* '^(19|20)[0-9][0-9][-\./]?([1-9]|0[1-9]|1[012])[-\./]?([1-9]|0[1-9]|[12][0-9]|3[01])' THEN 1 END) AS date_inverse,
  COUNT(CASE WHEN "dummy" ~* '^([01][0-9]|2[0-3]):([0-5][0-9])$' THEN 1 END) AS time_24h,
  COUNT(CASE WHEN "dummy" ~* '^(1[0-2]|0?[1-9]):[0-5][0-9]$' THEN 1 END) AS time_12h,
  COUNT(CASE WHEN "dummy" ~* '([0-9]|1[0-9]|2[0-4])[:-]([0-9]|[0-5][0-9])([:-]([0-9]|[0-5][0-9])(,[0-9]+)?)?$' THEN 1 END) AS time,
  COUNT(CASE WHEN "dummy" ~* '^([1-9][0-9]{3}-((0[1-9]|1[0-2])-(0[1-9]|1[0-9]|2[0-8])|(0[13-9]|1[0-2])-(29|30)|(0[13578]|1[02])-31)|([1-9][0-9](0[48]|[2468][048]|[13579][26])|([2468][048]|[13579][26])00)-02-29)T([01][0-9]|2[0-3]):[0-5][0-9]:[0-5][0-9](\.[0-9]+)?(Z|[+-][01][0-9]:[0-5][0-9])?$' THEN 1 END) AS date_iso_8601,
  COUNT(CASE WHEN "dummy" ~* '^[0-9a-fA-F]{8}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{4}\-[0-9a-fA-F]{12}$' THEN 1 END) AS uuid,
  COUNT(CASE WHEN "dummy" ~* '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' THEN 1 END) AS ip_address,
  COUNT(CASE WHEN "dummy" ~* '^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}$' THEN 1 END) AS ipv4_address,
  COUNT(CASE WHEN "dummy" ~* '^(([0-9a-fA-F]{1,4}:){7,7}[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,7}:|([0-9a-fA-F]{1,4}:){1,6}:[0-9a-fA-F]{1,4}|([0-9a-fA-F]{1,4}:){1,5}(:[0-9a-fA-F]{1,4}){1,2}|([0-9a-fA-F]{1,4}:){1,4}(:[0-9a-fA-F]{1,4}){1,3}|([0-9a-fA-F]{1,4}:){1,3}(:[0-9a-fA-F]{1,4}){1,4}|([0-9a-fA-F]{1,4}:){1,2}(:[0-9a-fA-F]{1,4}){1,5}|[0-9a-fA-F]{1,4}:((:[0-9a-fA-F]{1,4}){1,6})|:((:[0-9a-fA-F]{1,4}){1,7}|:)|fe80:(:[0-9a-fA-F]{0,4}){0,4}%[0-9a-zA-Z]{1,}|::(ffff(:0{1,4}){0,1}:){0,1}((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])|([0-9a-fA-F]{1,4}:){1,4}:((25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[0-9]){0,1}[0-9]))$' THEN 1 END) AS ipv6_address,
  COUNT(CASE WHEN "dummy" ~* '^[A-Za-z0-9.-_%]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$' THEN 1 END) AS email,
  COUNT(CASE WHEN "dummy" ~* '^((\+[0-9]{1,2}\s)?\(?[0-9]{3}\)?[\s.-])?[0-9]{3}[\s.-][0-9]{4}$' THEN 1 END) AS phone_number,
  COUNT(CASE WHEN "dummy" ~* '^[0-9]{14}|[0-9]{15}|[0-9]{16}|[0-9]{17}|[0-9]{18}|[0-9]{19}|([0-9]{4}-){3}[0-9]{4}|([0-9]{4} ){3}[0-9]{4}$' THEN 1 END) AS credit_card_number,
  COUNT("dummy") 
FROM (SELECT * FROM "public"."dual" LIMIT 1000) T
  | Exception: invalid regular expression: quantifier operand invalid
Traceback (most recent call last):
  File "/Volumes/Projects/DANG/clarity/.venv-soda/lib/python3.9/site-packages/sodasql/cli/cli.py", line 250, in analyze
    dataset_analyze_results = dataset_analyzer.analyze(warehouse, table_name)
  File "/Volumes/Projects/DANG/clarity/.venv-soda/lib/python3.9/site-packages/sodasql/dataset_analyzer.py", line 72, in analyze
    row = warehouse.sql_fetchone(
  File "/Volumes/Projects/DANG/clarity/.venv-soda/lib/python3.9/site-packages/sodasql/scan/warehouse.py", line 29, in sql_fetchone
    return sql_fetchone(self.connection, sql)
  File "/Volumes/Projects/DANG/clarity/.venv-soda/lib/python3.9/site-packages/sodasql/scan/db.py", line 22, in sql_fetchone
    return sql_fetchone_description(connection, sql)[0]
  File "/Volumes/Projects/DANG/clarity/.venv-soda/lib/python3.9/site-packages/sodasql/scan/db.py", line 35, in sql_fetchone_description
    cursor.execute(sql)
psycopg2.errors.InvalidRegularExpression: invalid regular expression: quantifier operand invalid

OS: MacOS 11.5.2

Python Version: Python 3.9.7

Soda SQL Version:

soda-sql-core==2.1.0b18
soda-sql-postgresql==2.1.0b18
soda-sql-snowflake==2.1.0b18

Warehouse Type: postgres

sivy avatar Oct 08 '21 17:10 sivy