airflow
airflow copied to clipboard
Redundant slash in GCS object URI if wildcard in source_path and no destination_path given in SFTPToGCSOperator
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
apache-airflow==2.7.3 apache-airflow-providers-celery==3.4.1 apache-airflow-providers-cncf-kubernetes==7.8.0 apache-airflow-providers-common-sql==1.8.0 apache-airflow-providers-ftp==3.6.0 apache-airflow-providers-google==10.11.0 apache-airflow-providers-hashicorp==3.5.0 apache-airflow-providers-http==4.6.0 apache-airflow-providers-sftp==4.7.0 apache-airflow-providers-ssh==3.8.1
Apache Airflow version
2.7.3
Operating System
Python 3.11.8, Debian 11 (bullseye)
Deployment
Docker-Compose
Deployment details
No response
What happened
Hi,
while using SFTPToGCSOperator
(docs) without (optional) destination_path
param and with wildcard (*
) symbol in source_path
param, there will be redundant forward slash character (/
) left in between bucket name and object's name after uploading to GCS, please see log entry below:
[2024-08-26, 15:23:25 UTC] {sftp_to_gcs.py:149} INFO - Executing copy of /home/sftp_user/data/sample_file_01.txt to gs://sftp-test-bucket-240826//sample_file_01.txt
This is how it looks in GCP Cloud Console:
What you think should happen instead
There shouldn't be any extra forward slash when file(s) is/are placed in the main bucket path with wildcard (*
) symbol in source_path
param and destination_path
param omitted.
How to reproduce
Sample task definition:
from airflow.utils.dates import days_ago
from airflow import DAG
from airflow.providers.google.cloud.transfers.sftp_to_gcs import SFTPToGCSOperator
with DAG(
dag_id='test_dag',
start_date=days_ago(1),
) as dag:
task = SFTPToGCSOperator(
task_id='test',
source_path='/home/sftp_user/data/sample_file_*.txt',
destination_bucket='sftp-test-bucket-240826',
)
Anything else
No response
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct