seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Bug] [localFile-sink] Plugin saves open text tabular data regardless of fileFormat value

Open mgierdal opened this issue 1 year ago • 5 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

I am tying to sink FakeSource data to a file. It always saves it in the same format, regardless of fileFormat value:

xZUuh696150845
vsURy1080328609
bmcPC267811950
xkYLR1744336535

If I modify formatting by introducing

field_delimiter = "|"
row_delimiter = "\n"

the plugin behaves accordingly:

brLkB|758957279
HGPBr|570409452
AlWrl|407909156
FdHHx|1323179300
VFgHy|602615979

ufortunately again regardless of the declared file format.

SeaTunnel Version

2.3.2

SeaTunnel Config

{
    "env" : {
        "execution.parallelism" : 2,
        "job.mode" : "BATCH",
        "checkpoint.interval" : 10000
    },
    "source" : [
        {
            "schema" : {
                "fields" : {
                    "name" : "string",
                    "age" : "int"
                }
            },
            "row.num" : 16,
            "parallelism" : 2,
            "result_table_name" : "fake",
            "plugin_name" : "FakeSource"
        }
    ],
    "sink" : [
        {
            "path" : "tjson",
            "plugin_name" : "localFile",
            "fileFormat" : "json"
        }
    ]
}

Running Command

./bin/seatunnel.sh --config ./config/test_to_file.config -m local

Error Exception

none

Flink or Spark Version

none

Java or Scala Version

java 17.0.7 2023-04-18 LTS

Screenshots

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

mgierdal avatar Jun 21 '23 18:06 mgierdal

Please provide.conf file

zhilinli123 avatar Jun 24 '23 09:06 zhilinli123

Please display your configuration file。

ddww avatar Jun 26 '23 03:06 ddww

The config file in extenso:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
######
###### This config file is a demonstration of streaming processing in Seatunnel config
######

env {
  # You can set SeaTunnel environment configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {
  # This is a example source plugin **only for test and demonstrate the feature source plugin**
  FakeSource {
    parallelism = 2
    result_table_name = "fake"
    row.num = 16
    schema = {
      fields {
        name = "string"
        age = "int"
      }
    }
  }

  # If you would like to get more information about how to configure Seatunnel and see full list of source plugins,
  # please go to https://seatunnel.apache.org/docs/category/source-v2
}
sink {
  localFile {
    path="tjson"
    fileFormat="json"
#    field_delimiter = "|"
#    row_delimiter = "\n"
    }
  # If you would like to get more information about how to configure Seatunnel and see full list of sink plugins,
  # please go to https://seatunnel.apache.org/docs/category/sink-v2

mgierdal avatar Jun 26 '23 16:06 mgierdal

I checked your configuration file. The problem is simple. Firstly, there is no configuration item for fileFormat. Secondly, I found that there are also some issues with the official documents. You can try ' file_format_type="json" '.

ddww avatar Jul 04 '23 06:07 ddww

Indeed, documentation was misleading. Changing a config field name from fileFormat to file_format_type made it work. Thanks!

mgierdal avatar Jul 11 '23 19:07 mgierdal