seatunnel
seatunnel copied to clipboard
[Feature][File Connector] Unified Field Delimiter Configuration
Search before asking
- [X] I had searched in the feature and found no similar feature requirement.
Description
if I want write data into txt file and use filed_delimiter |
and then read the file, I must config the filed_delimiter different in source
and sink
like this:
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
env {
parallelism = 1
job.mode = "BATCH"
# You can set spark configuration here
spark.app.name = "SeaTunnel"
spark.executor.instances = 2
spark.executor.cores = 1
spark.executor.memory = "1g"
spark.master = local
job.mode = "BATCH"
}
source {
LocalFile {
path = "/Users/gaojun/tmp/test.txt"
schema {
fields {
id = string
create_time = string
revoke_time = string
revoked = string
update_time = string
contact_email = string
expired = string
last_login_time = string
locked = string
locked_time = string
nick_name = string
password = string
password_expired_timestamp = string
password_modify_time = string
times_of_login_failure = string
user_type = string
username = string
department_id = string
sale_isolation = string
}
}
file_format_type = "text"
file_format_type = "text"
delimiter="\\|\u001B"
result_table_name = "fake"
}
}
sink {
LocalFile {
path = "/Users/gaojun/tmp/seatunnel/text"
row_delimiter = "\n"
file_format_type = "text"
filename_time_format = "yyyy.MM.dd"
is_enable_transaction = true
field_delimiter="|\u001B"
}
}
You can see field_delimiter
in source is \\|\u001B
and in sink is |\u001B
. From the code I found in Source we will use line.split(xxx) to split the line into fields, If we use linke.split("|\u001B")
will throw Exception, so field_delimiter must be \\|\u001B
in source. In the sink code we use String.join(xxx, fields)
to connect all fields into a row. if we use String.join("\\|\u001B", fields)
the \
will write into file and the we read this file with '\\|\u001B
' will get error.
So, We need an optimized approach to unify the configuration of the same delimiter in both source and sink to prevent inconvenience to users.
Usage Scenario
No response
Related issues
No response
Are you willing to submit a PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I didn't find any problems using delimiter = "\|" directly.
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.