starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

Try to load data from a csv into StarRocks。but the last `'` remains after loading

Open asdfsx opened this issue 1 year ago • 2 comments

Try to load data from a csv into StarRocks, below is one of them。After loading I find the data in StarRocks keep the last '

/wp-admin/edit.php?post_type=post,'[{"support": 1.0, "item": 23802228611909434500675223637494144312}, {"support": 1.0, "item": 174472487042906779548867851357749048616}, {"supp
ort": 0.967741935483871, "item": 218473117571251542901396678363898265937}, {"support": 0.967741935483871, "item": 231554223331928306785505632432118524085}, {"support": 0.93548
38709677419, "item": 251450111408553977209416691192262766043}, {"support": 0.8709677419354839, "item": 255844660388155386531595464064842665953}, {"support": 0.8709677419354839
, "item": 43431313178276204057415236603873012077}, {"support": 0.8387096774193549, "item": 134183800354940827219830043105011992932}, {"support": 0.8064516129032258, "item": 12
4252827735266096217734490785960544895}, {"support": 0.7741935483870968, "item": 308786225706730603494773708575056993695}, {"support": 0.7741935483870968, "item": 1075746400190
33010051536354268230337001}, {"support": 0.7419354838709677, "item": 257105383019231995168090203574525649981}, {"support": 0.7419354838709677, "item": 559424449179993410490394
93638467366617}, {"support": 0.7419354838709677, "item": 241284497872922364543607046042548588103}, {"support": 0.7096774193548387, "item": 224156408131500668504211608160659624
317}, {"support": 0.6774193548387096, "item": 178058453208626738767032585709146263373}, {"support": 0.7419354838709677, "item": 332783421630035866156352528202461149697}]'

Steps to reproduce the behavior (Required)

  1. create table

    create table test.frequent_itemset_str2(
     id bigint not null AUTO_INCREMENT,
     url STRING not NULL,
     itemsets STRING not NULL
    ) ENGINE = olap
    PRIMARY KEY (id);
    
  2. import data

    curl --location-trusted -u root             \
     -T ./test.csv        \
     -H "column_separator:,"                 \
     -H "skip_header:1"                      \
     -H "enclose:'"                         \
     -H "max_filter_ratio:1"                 \
     -H "columns: url, itemsets"                \
     -XPUT http://127.0.0.1:8030/api/test/frequent_itemset_str/_stream_load
    
  3. query

    SELECT id, url, parse_json(itemsets) FROM deepflow.frequent_itemset_str;
    id |url                                               |parse_json(itemsets)|
    ---+--------------------------------------------------+--------------------+
    609|/wp-admin/edit.php?post_type=post&trashed=1&ids=53|                    |
    610|/wp-admin/edit.php?post_type=post&trashed=1&ids=65|                    |
    608|/wp-admin/edit.php?post_type=post&author=1        |                    |
    

    contents in column itemsets should be json,but they failed to transform because of the '

image

Expected behavior (Required)

Real behavior (Required)

StarRocks version (Required)

3.3.0-19a3f66

asdfsx avatar Oct 10 '24 12:10 asdfsx

A simple example The csv file & the load scripts image

Query table image

asdfsx avatar Oct 21 '24 01:10 asdfsx

@jaogoy

asdfsx avatar Oct 21 '24 07:10 asdfsx

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

github-actions[bot] avatar Apr 21 '25 11:04 github-actions[bot]

@asdfsx If the data is correct, and you are sure the enclose is not correctly processed. It should be a bug about the enclose.

cc @wyb

jaogoy avatar Jul 28 '25 08:07 jaogoy

I almost forget everything about the issue, since it has passed so long. I think the data is correct, but I don't have the environment to reproduce the problem right now. And I think reproduce the problem is very easy. If u find this isn't a bug, or it have been repaired, I can close the issue. @jaogoy

asdfsx avatar Jul 29 '25 02:07 asdfsx

@wyb you can check it whether it's a bug about enclose.

jaogoy avatar Aug 05 '25 06:08 jaogoy

I ran into the same issue. After some investigation, I found out that the problem only occurs when the csv file is in a Windows format (CR + LF).

If the file is an Linux format (LF), it works fine. It seems to be linked to the end of line character.

yvesblt avatar Aug 19 '25 15:08 yvesblt