sling-cli icon indicating copy to clipboard operation
sling-cli copied to clipboard

Best way to sling a MySQL BLOB of integers to a PostgreSQL int array

Open Codykilpatrick opened this issue 1 year ago • 9 comments

I have a MySQL source table with a BLOB column storing a list of integers from 0-255. Pixel values essentially. Is there an optimal way to sling these to a PostgreSQL table with an INT ARRAY column type? Appreciate any help!

Codykilpatrick avatar Jan 25 '24 19:01 Codykilpatrick

No way for native int array, but jsonb should be possible.

Since blob is mapped to text (see here), you will to cast it as a json yourself with the source.options.columns options. Put something like { my_blob_col: json }, and it should create a jsonb column in postgres.

If int array type in postgres is a must, best I can see is you running a target.options.post_sql query to do that yourself.

flarco avatar Jan 25 '24 19:01 flarco

Thanks! When setting the source column to "json" and running Sling, it reports that it can't insert into the target database because:

pq: invalid byte sequence for encoding "UTF8": 0x00

Does a transform need to be changed?

Codykilpatrick avatar Jan 25 '24 19:01 Codykilpatrick

can you try with source options { transforms: [ replace_0x00 ] }? That's the unicode NULL byte

flarco avatar Jan 25 '24 19:01 flarco

Aha, stuck on an older Sling version at the moment but will try that as soon as possible.

Codykilpatrick avatar Jan 25 '24 20:01 Codykilpatrick

Looking at https://stackoverflow.com/a/25600307

Can you try one of the decode transforms? See https://github.com/slingdata-io/sling-cli/blob/main/core/sling/tranforms.go.

On Thu, Jan 25, 2024, 6:21 PM Cody Kilpatrick @.***> wrote:

can you try with source options { transforms: [ replace_0x00 ] }? That's the unicode NULL byte

That worked after switching to v1.0.71, but now it complains about byte character 0xba 😄

— Reply to this email directly, view it on GitHub https://github.com/slingdata-io/sling-cli/issues/132#issuecomment-1911017853, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2QZYUK5ZRERUQQCIWFRBDYQLEGZAVCNFSM6AAAAABCLBXO26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJRGAYTOOBVGM . You are receiving this because you commented.Message ID: @.***>

flarco avatar Jan 25 '24 21:01 flarco

Also try a different encoding, not too familiar with MySQL advanced settings: ALTER DATABASE mydb CHARSET utf8;

flarco avatar Jan 25 '24 23:01 flarco

Just some updates not success yet but I tried: "transforms": ["replace_0x00", "Each decode transformation"] "hex_representation": "text and json" With my postgres table typed as both jsonb and text.

The closest result I got was with just replace_0x00, no transforms into a postgres text column and I received this error: pq: invalid byte sequence for encoding "UTF8": 0xc1 0xe4 Which almost feels like the value in my int array are being converted to Ascii?

Also just to re-set to the same page: I am running sling with this command:

docker run --network=forerunner-network --rm -v "/Users/codykilpatrick/code/spear/forerunner/packages/forerunner-legacy-api/sling-docker/task.json:/task.json" -i slingdata/sling:v1.0.71 run -d  -c  task.json

my task.json:

{
  "source": {
    "conn": "mysql_conn_string",
    "stream": "SELECT heading, typekey, bearing_time_configuration, date, id, pixels AS hex_representation FROM forerunner_legacy.bearing_time_record WHERE {incremental_where_cond};",
    "primary_key": "id",
    "update_key": "date",
    "options": {}
  },
  "target": {
    "conn": "postgres_conn_string",
    "object": "forerunner_public.bearing_time_record_staging",
    "options": {
      "table_tmp": "forerunner_public.bearing_time_record_staging_temp",
      "pre_sql": "CREATE INDEX temp_idx_id ON forerunner_public.bearing_time_record_staging_temp (id);"
    }
  },
  "mode": "incremental"
}

My Postgres Table

CREATE TABLE forerunner_public.bearing_time_record_staging (
    id int PRIMARY KEY,
    hex_representation text,
    date bigint,
    heading bigint,
    bearing_time_configuration text,
    typekey int,
    is_synced boolean
);

My MySQL table:

CREATE TABLE bearing_time_record (
    id INT NOT NULL,
    date BIGINT NOT NULL,
    heading INT NOT NULL,
    pixels BLOB NOT NULL,
    bearing_time_configuration VARCHAR(255) NOT NULL,
    typekey INT NOT NULL,
    PRIMARY KEY (id)
);

Also thank you so much for all of your help, you don't know how appreciative I am.

Codykilpatrick avatar Jan 26 '24 00:01 Codykilpatrick

Ah ok, can you find out what the default charset / encoding is for your MySQL source?

On Thu, Jan 25, 2024, 9:08 PM Cody Kilpatrick @.***> wrote:

Just some updates not success yet but I tried: "transforms": ["replace_0x00", "Each decode transformation"] "hex_representation": "text and json" With my postgres table typed as both jsonb and text.

The closest result I got was with just replace_0x00, no transforms into a postgres text column and I received this error: pq: invalid byte sequence for encoding "UTF8": 0xc1 0xe4 Which almost feels like the value in my int array are being converted to Ascii?

Also just to re-set to the same page: I am running sling with this command:

docker run --network=forerunner-network --rm -v "/Users/codykilpatrick/code/spear/forerunner/packages/forerunner-legacy-api/sling-docker/task.json:/task.json" -i slingdata/sling:v1.0.71 run -d -c task.json

my task.json:

{ "source": { "conn": "mysql_conn_string", "stream": "SELECT heading, typekey, bearing_time_configuration, date, id, pixels AS hex_representation FROM forerunner_legacy.bearing_time_record WHERE {incremental_where_cond};", "primary_key": "id", "update_key": "date", "options": {} }, "target": { "conn": "postgres_conn_string", "object": "forerunner_public.bearing_time_record_staging", "options": { "table_tmp": "forerunner_public.bearing_time_record_staging_temp", "pre_sql": "CREATE INDEX temp_idx_id ON forerunner_public.bearing_time_record_staging_temp (id);" } }, "mode": "incremental" }

My Postgres Table

CREATE TABLE forerunner_public.bearing_time_record_staging ( id int PRIMARY KEY, hex_representation text, date bigint, heading bigint, bearing_time_configuration text, typekey int, is_synced boolean );

My MySQL table:

CREATE TABLE bearing_time_record ( id INT NOT NULL, date BIGINT NOT NULL, heading INT NOT NULL, pixels BLOB NOT NULL, bearing_time_configuration VARCHAR(255) NOT NULL, typekey INT NOT NULL, PRIMARY KEY (id) );

Also thank you so much for all of your help, you don't know how appreciative I am.

— Reply to this email directly, view it on GitHub https://github.com/slingdata-io/sling-cli/issues/132#issuecomment-1911198242, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB2QZYX5TOK5MLKAQYZXQDTYQLXW5AVCNFSM6AAAAABCLBXO26VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJRGE4TQMRUGI . You are receiving this because you commented.Message ID: @.***>

flarco avatar Jan 26 '24 01:01 flarco

Hey Sorry for the late response on this one. This query: SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE schema_name = "my_db";

Returns: DEFAULT_CHARACTER_SET_NAME = latin1 DEFAULT_COLLATION_NAME=latin1_swedish_ci

Codykilpatrick avatar Jan 30 '24 13:01 Codykilpatrick