seatunnel
seatunnel copied to clipboard
[Bug] [localFile-sink] Plugin saves open text tabular data regardless of fileFormat value
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
I am tying to sink FakeSource data to a file. It always saves it in the same format, regardless of fileFormat value:
xZUuh696150845
vsURy1080328609
bmcPC267811950
xkYLR1744336535
If I modify formatting by introducing
field_delimiter = "|"
row_delimiter = "\n"
the plugin behaves accordingly:
brLkB|758957279
HGPBr|570409452
AlWrl|407909156
FdHHx|1323179300
VFgHy|602615979
ufortunately again regardless of the declared file format.
SeaTunnel Version
2.3.2
SeaTunnel Config
{
"env" : {
"execution.parallelism" : 2,
"job.mode" : "BATCH",
"checkpoint.interval" : 10000
},
"source" : [
{
"schema" : {
"fields" : {
"name" : "string",
"age" : "int"
}
},
"row.num" : 16,
"parallelism" : 2,
"result_table_name" : "fake",
"plugin_name" : "FakeSource"
}
],
"sink" : [
{
"path" : "tjson",
"plugin_name" : "localFile",
"fileFormat" : "json"
}
]
}
Running Command
./bin/seatunnel.sh --config ./config/test_to_file.config -m local
Error Exception
none
Flink or Spark Version
none
Java or Scala Version
java 17.0.7 2023-04-18 LTS
Screenshots
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Please provide.conf file
Please display your configuration file。
The config file in extenso:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
######
###### This config file is a demonstration of streaming processing in Seatunnel config
######
env {
# You can set SeaTunnel environment configuration here
execution.parallelism = 2
job.mode = "BATCH"
checkpoint.interval = 10000
#execution.checkpoint.interval = 10000
#execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}
source {
# This is a example source plugin **only for test and demonstrate the feature source plugin**
FakeSource {
parallelism = 2
result_table_name = "fake"
row.num = 16
schema = {
fields {
name = "string"
age = "int"
}
}
}
# If you would like to get more information about how to configure Seatunnel and see full list of source plugins,
# please go to https://seatunnel.apache.org/docs/category/source-v2
}
sink {
localFile {
path="tjson"
fileFormat="json"
# field_delimiter = "|"
# row_delimiter = "\n"
}
# If you would like to get more information about how to configure Seatunnel and see full list of sink plugins,
# please go to https://seatunnel.apache.org/docs/category/sink-v2
I checked your configuration file. The problem is simple. Firstly, there is no configuration item for fileFormat. Secondly, I found that there are also some issues with the official documents. You can try ' file_format_type="json" '.
Indeed, documentation was misleading. Changing a config field name from fileFormat
to file_format_type
made it work. Thanks!