airflow
airflow copied to clipboard
Redundant slash in GCS object URI if wildcard in source_path and no destination_path given in SFTPToGCSOperator
Apache Airflow Provider(s)
Versions of Apache Airflow Providers
apache-airflow==2.7.3 apache-airflow-providers-celery==3.4.1 apache-airflow-providers-cncf-kubernetes==7.8.0 apache-airflow-providers-common-sql==1.8.0 apache-airflow-providers-ftp==3.6.0 apache-airflow-providers-google==10.11.0 apache-airflow-providers-hashicorp==3.5.0 apache-airflow-providers-http==4.6.0 apache-airflow-providers-sftp==4.7.0 apache-airflow-providers-ssh==3.8.1
Apache Airflow version
2.7.3
Operating System
Python 3.11.8, Debian 11 (bullseye)
Deployment
Docker-Compose
Deployment details
No response
What happened
Hi,
while using SFTPToGCSOperator (docs) without (optional) destination_path param and with wildcard (*) symbol in source_path param, there will be redundant forward slash character (/) left in between bucket name and object's name after uploading to GCS, please see log entry below:
[2024-08-26, 15:23:25 UTC] {sftp_to_gcs.py:149} INFO - Executing copy of /home/sftp_user/data/sample_file_01.txt to gs://sftp-test-bucket-240826//sample_file_01.txt
This is how it looks in GCP Cloud Console:
What you think should happen instead
There shouldn't be any extra forward slash when file(s) is/are placed in the main bucket path with wildcard (*) symbol in source_path param and destination_path param omitted.
How to reproduce
Sample task definition:
from airflow.utils.dates import days_ago
from airflow import DAG
from airflow.providers.google.cloud.transfers.sftp_to_gcs import SFTPToGCSOperator
with DAG(
dag_id='test_dag',
start_date=days_ago(1),
) as dag:
task = SFTPToGCSOperator(
task_id='test',
source_path='/home/sftp_user/data/sample_file_*.txt',
destination_bucket='sftp-test-bucket-240826',
)
Anything else
No response
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.