starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

StarRocks export data to csv format error

Open boolean-dev opened this issue 1 year ago • 0 comments

Steps to reproduce the behavior (Required)

1. Objective

I want to import data from StarRocks into PostgreSQL, so I have taken the following steps:

  1. Export the data from StarRocks in CSV format to S3, there are two approaches:

    1. Export data using EXPORT | StarRocks
    2. Unload data using INSERT INTO FILES | StarRocks
  2. Use AWS PostgreSQL to import the CSV data from S3 into the database, the approach is as follows:

    1. Importing data from Amazon S3 to your RDS for PostgreSQL DB instance - Amazon Relational Database Service

2. Basic Data

My data contains several VARCHAR fields, where the data stored includes common delimiters such as |​, ,​ and #​ within the strings.

Table structure:

CREATE TABLE test_table
(
    id              INT            NULL,
    lrn             VARCHAR(255)   NULL,
    awb             VARCHAR(255)   NULL,
    tracking_number VARCHAR(255)   NULL,
    recipient       VARCHAR(65533) NULL,
    sender          VARCHAR(65533) NULL,
    currency        VARCHAR(255)   NULL,
    item            VARCHAR(65533) NULL,
    container       VARCHAR(255)   NULL,
    created_at      DATETIME       NULL,
    updated_at      DATETIME       NULL,
    ioss            VARCHAR(255)   NULL,
    parcel_no       VARCHAR(255)   NULL,
    vat_number      VARCHAR(255)   NULL,
    origin_ioss     VARCHAR(255)   NULL,
    update_ioss_at  DATETIME       NULL,
    archived        TINYINT(1)     NULL,
    transport_cost  DECIMAL(9, 2)  NULL
)

Data as follows:

49321237,
FTLaaaaaaa256993,
235-111111934,
6G58638256993,
{"city": "| COI****THIER", "name": "an****HER", "address": "798 **** Mo", "country": "FR", "zipcode": "7***0", "address2": "", "cityCode": "73"},
{"city": "BEIJING", "name": "BEIJ ***  LTD", "address": "R *** HENG", "country": "CN", "zipcode": "1***23", "address2": "SHI***R", "cityCode": "10"},
EUR,
{"count": 1, "value": 0.58, "hsCode": "6***9", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.187, "opedest": "O***ONNEL", "currency": "EUR", "describe": "CLOTHES", "nattrans": 11, "packagingType": ""},
FRFR1111111156,
2023-01-01 04:03:11.311006,
2023-01-01 04:03:11.311006,
\N,
\N,
\N,
\N,
\N,
\N,
false,
\N

4. EXPORT Plan

Export SQL

EXPORT TABLE test_table
PARTITION ('p202212') 
TO "s3://test_bucket/**/s1/"
PROPERTIES
(
    "column_separator"=","
) 
WITH BROKER
(
    "aws.s3.access_key" = "AAAAAAAAAAAA",
    "aws.s3.secret_key" = "KKKKKKKKKKKKKKKKKKKKKKKK",
    "aws.s3.region" = "eu-west-1"
);

CSV data

49318060,FTL611111151839,157-1111113,6G58111111119,{"city": "BaaaaST", "name": "ARNaaaaRY", "address": "5 SaaaaN", "country": "FR", "zipcode": "21110", "address2": "", "cityCode": "29"},{"city": "SHaaaaHAI", "name": "SHANaaaaLTD", "address": "Pudoaaaat", "country": "CN", "zipcode": "211115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 24.89, "hsCode": "3111110", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.624, "opedest": "OCaaaaaNEL", "currency": "EUR", "describe": "Diaaaaent", "nattrans": 11, "packagingType": ""},YWFTL11111111103001,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,R8j5uaiaaaaaaaaaHBFw==,\N,\N,\N,\N,true,0.00

The JSON data in the CSV contains ,​, it should be enclosed with "​:

"{""city"": ""BaaaaST"", ""name"": ""ARNaaaaRY""}"

5. INSERT INTO FILES Plan

sql

INSERT INTO 
FILES(
    "path" = "s3://test_bucket/**/insert",
    "format" = "csv",
    "csv.column_separator"=",",
    "csv.enclose"='"',
    "csv.escape"="\\",
    "compression" = "uncompressed",
    "target_max_file_size" = "1024", -- 1KB
    "aws.s3.access_key" = "AKIAaaaaaaaaSMKS",
    "aws.s3.secret_key" = "TpiojLaaaaaaaaaaaauDDaYZr",
    "aws.s3.region" = "eu-west-1"
)
SELECT * FROM test_table LIMIT 1000;

csv data

49318068,FTL61111541685,157-1111113,6G511111111185,{"city": "ME11C", "name": "CaaETTE", "address": "10aa rige", "country": "FR", "zipcode": "11110", "address2": "", "cityCode": "19"},{"city": "SH11AI", "name": "SHAN11SS CO.LTD", "address": "P11 ort", "country": "CN", "zipcode": "2115", "address2": "", "cityCode": "20"},EUR,{"count": 1, "value": 14.09, "hsCode": "8111", "modrep": 3, "nbrart": 1, "nbrcol": 1, "weight": 0.34, "opedest": "OCC111EL", "currency": "EUR", "describe": "Camping light", "nattrans": 11, "packagingType": ""},YW11113058,2022-12-31 10:47:50.911922,2022-12-31 10:47:50.911922,GUub04OA1111E51kG+p2Sg==,\N,\N,\N,\N,true,0.00

The JSON data in the above CSV still appears incorrectly

Expected behavior (Required)

the csv export data should be like this:

49318048,FTL6G58753149446,157-87924513,6G58753149446,"{""city"": ""BaaaaST"", ""name"": ""ARNaaaaRY""}",\N,....

Real behavior (Required)

but the csv data is this:

49318060,FTL611111151839,157-1111113,6G58111111119,{"city": "BaaaaST", "name": "ARNaaaaRY"},\N,....

StarRocks version (Required)

3.3.4

boolean-dev avatar Oct 17 '24 01:10 boolean-dev