airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Add FTPOperator and FTPSOperator

Open RachitSharma2001 opened this issue 2 years ago • 3 comments

Description

There already exists an SFTPOperator (documentation here) which provides an easy way to read data from an SFTP Server to the local disk or write data to an SFTP server. However, for FTP and FTPS Servers, no such operator exists. Rather, there only exists ftp hooks and sensors (look within the airflow/airflow/providers/ftp directory, and compare to the airflow/airflow/providers/sftp directory).

Use case/motivation

I am hoping to provide the following two operators for Airflow Developers:

FTPOperator(
        task_id="operation",
        ftp_conn_id="ftp_default",
        local_filepath="route_to_local_file",
        remote_filepath="remote_route_to_copy",
        operation="put",
        dag=dag
)
FTPSOperator(
        task_id="operation",
        ftps_conn_id="ftps_default",
        local_filepath="route_to_local_file",
        remote_filepath="remote_route_to_copy",
        operation="put",
        dag=dag
)

The FTP Operator would connect to an FTP server with no encryption protocol, and will copy files from that server to local disk (if the operation is "get") or will copy a file on local disk to the server (if the operation is "put"). The FTPS Operator would do the same thing but for an FTP Server with TLS encryption protocol.

Related issues

No response

Are you willing to submit a PR?

  • [X] Yes I am willing to submit a PR!

Code of Conduct

RachitSharma2001 avatar Sep 20 '22 17:09 RachitSharma2001

Thanks for opening your first issue here! Be sure to follow the issue template!

boring-cyborg[bot] avatar Sep 20 '22 17:09 boring-cyborg[bot]

Sounds great! Feel free to submit a PR.

josh-fell avatar Sep 20 '22 18:09 josh-fell

FYI at a previous company I implemented such operators and at a bare minimum I found it important to at least check the size of the file transferred was the same as the remote server reported: https://docs.python.org/3/library/ftplib.html#ftplib.FTP.size. In my experience FTP was more susceptible to things going wrong midtransfer.

I also implemented other integrity checks such as seeing if the server supported HASH algorithms, but I found that supporting such features like that required a bit of battle testing against many different types of FTP servers as you could get very unexpected results (such as saying they supported it but only ever returning 0, or providing the result with an additional prefix or postfix string).

notatallshaw-gts avatar Sep 20 '22 19:09 notatallshaw-gts