embulk-output-redshift is not compatible with SUPER type

Open hibira opened this issue 2 years ago • 0 comments

embulk-output-redshift does not support SUPER type.

Currently, only VARBYTE can be used to send data to Redshift. However, VARBYTE has an upper limit of 65535 bytes, so it is not possible to migrate strings longer than that.

By supporting Redshift's SUPER type, longer strings can be migrated.

Note that the following settings can be used as a workaround, but I would like to see formal support for this.

The type of column (or column_options) in must be json <--- If this is a string, the file used by COPY was corrupted.
out's column_options must be { type: "SUPER", value_type: json } <--- If this is a string, a 65535 byte constraint error occurred.

in:
  type: file
  path_prefix: ./input.csv
  parser:
    charset: UTF-8
    newline: CRLF
    type: csv
    delimiter: ","
    quote: "'"
    escape: "'"
    null_string: "NULL"
    skip_header_lines: 1
    columns:
      - { name: col1, type: long }
      - { name: col2, type: string }
      - { name: col3, type: json }
out:
  type: redshift
  host: xxxxxx
  user: xxxxxx
  password: xxxxxx
  database: dev
  table: sample_table
  aws_access_key_id: xxxxxx
  aws_secret_access_key: xxxxxx
  iam_user_name: xxxxxx
  s3_bucket: xxxxxx
  s3_key_prefix: xxxxxx
  mode: insert
  column_options:
    col1: { type: "INTEGER" }
    col2: { type: "VARCHAR(255)" }
    col3: { type: "SUPER", value_type: json }

Dec 14 '23 08:12 hibira