seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

[Bug] [FTP Connector] The reading and writing of FTP are very slow

Open Xuzhengz opened this issue 1 year ago • 3 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

The read and write of FTP is very slow. I have tried to connect to FTP and it took a few seconds to complete. I have ruled out the reason for the slow connection. When reading, it takes a while to create a task, and then assigning the read FTP task to subtasks is also slow. When writing, the release classloader keeps releasing, and only one piece of data is written out, but the task takes a few minutes to complete.

SeaTunnel Version

dev-2.3.6

SeaTunnel Config

{
    "env": {
        "job.name": "Xml文件输出",
        "job.mode": "batch"
    },
    "preHandler": [

    ],
    "source": [
        {
            "plugin_name": "Jdbc",
            "driver": "com.mysql.cj.jdbc.Driver",
            "connection_check_timeout_sec": 100,
            "table_list": [
                {
                    "table_path": "test_data.device",
                    "query": "SELECT\n `device_id`,\n `name`,\n `type`,\n `longitude`,\n `latitude`,\n `height`,\n `radius`,\n `distance`,\n `service_address`,\n `status`,\n `term_type`,\n `properties`,\n `runway_name`,\n `direction`,\n `runway_code`,\n `delay`\nFROM\n `device`"
                }
            ],
            "database": "test_data",
            "url": "jdbc:mysql://******:3306/test_data?remarks=true&useInformationSchema=true&useCursorFetch=true&defaultFetchSize=2048&rewriteBatchedStatements=true",
            "user": "******",
            "password": "******",
            "result_table_name": "ot_b7ba264ac3a84eb4b4d1b3bb93373a20"
        }
    ],
    "transform": [

    ],
    "sink": [
        {
            "file_format_type": "xml",
            "custom_filename": true,
            "file_name_expression": "xml_test",
            "is_enable_transaction": false,
            "xml_root_tag": "RECORDS",
            "xml_row_tag": "RECORD",
            "xml_use_attr_format": false,
            "batch_size": 1000000000,
            "plugin_name": "FtpFile",
            "host": "******",
            "port": "******",
            "user": "******",
            "password": "******",
            "tmp_path": "/ottomi/tmp/ottomi",
            "path": "/ottomi/file-node/download/1793861143369256962/xml/",
            "result_table_name": "ot_16aad011b9314e15977921dac312ca5f",
            "source_table_name": [
                "ot_b7ba264ac3a84eb4b4d1b3bb93373a20"
            ]
        }
    ]
}

Running Command

bin/seatunnel.sh -c ftp.json

Error Exception

A small amount of data, but the task took a few minutes to complete, or even a long time without any response, and the client disconnected



java.lang.RuntimeException: org.apache.hadoop.fs.ftp.FTPException: Client not connected
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:262)
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:68)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:703)
at

Zeta or Flink or Spark Version

No response

Java or Scala Version

1.8

Screenshots

image

Are you willing to submit PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

Xuzhengz avatar Jun 21 '24 13:06 Xuzhengz

Compared to other file read and write plugins such as S3 and local, they are both fast, but FTP is particularly slow

Xuzhengz avatar Jun 21 '24 13:06 Xuzhengz

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar Jul 22 '24 00:07 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Jul 30 '24 00:07 github-actions[bot]

This issue has been closed because it has not received response for too long time. You could reopen it if you encountered similar problems in the future.

github-actions[bot] avatar Jul 12 '25 00:07 github-actions[bot]