Get files from SFTP server using pattern of files
Feature Description
SFTP connector with pattern search using regex.
What would you like to see added to Sling?
Current sftp connector works absolutely perfect for one file i.e. abc.csv or multiple files abc_def.csv or abcdef.csv. however it does not seem to take the pattern like ^abc_\d{4}-\d{2}-\d{2}.csv$
Is there a way of solving this in sling ?
Regex are not currently accepted, but glob is.
Can you try abc_????-??-??.csv?
It does not work, here it the error, it skips after _
abc_"
file does not exist[0m
Sling command failed:
abc_*.csv works but we have some files with similar names hence we wanted more granular approach.
here are some more similar error lines
[90m4:27PM[0m [32mINF[0m [1 / 1] running stream sftp://transfers2.vendor.com/my_company/abc_?.csv [90m4:27PM[0m [32mINF[0m connecting to target database (snowflake) [90m4:27PM[0m [32mINF[0m reading from source file system (sftp) [90m4:27PM[0m [32mINF[0m [31mexecution failed[0m [90m4:27PM[0m [32mINF[0m [31m~ error listing path: "my_company/abc_" file does not exist[0m
Should be good now with https://github.com/slingdata-io/sling-cli/pull/344/commits/5f9d3a5e7aed59ba79c4089d16e0c1d3eae0b9b0 Feel free to build on that branch and test. Closing.
sling conns discover local -p 'sling-cli/core/dbio/iop/???.go'
+---+-----------------------------------+------+---------+--------------------------------+
| # | NAME | TYPE | SIZE | LAST UPDATED (UTC) |
+---+-----------------------------------+------+---------+--------------------------------+
| 1 | sling-cli/core/dbio/iop/csv.go | file | 14 KiB | 2024-04-15 14:48:13 (120d ago) |
| 2 | sling-cli/core/dbio/iop/ssh.go | file | 8.5 KiB | 2024-08-13 10:16:20 (15h ago) |
+---+-----------------------------------+------+---------+--------------------------------+
This works when only pattern is available, like ???.csv will give asc.csv
with you we suffix or prefix it with a string it does not work, for e.g. "abc_???.csv" will not return abc_def.csv it fails saying no file found.
OK, applied https://github.com/slingdata-io/sling-cli/pull/357/commits/ffe5a02fe541426856a3c724ce28c1b90240ce3c Can you try a dev build?
https://f.slingdata.io/dev/latest/sling_linux_amd64.tar.gz https://f.slingdata.io/dev/latest/sling_darwin_arm64.tar.gz
I am using Version v1.2.16.dev
Replication not working
~ failure running replication (see docs @ https://docs.slingdata.io/sling-cli) --------------------------- sftp://transfers2.mysftp.com/my_folder/SITE_????-??-??.csv --------------------------- ~ error listing path: "my_folder/SITE_" file does not exist
looks like its skipping after _
can you share your replication?
source: MY_SFTP
target: SNOWFLAKE
defaults:
mode: truncate
streams:
"sftp://transfers2.mysftp.com/myfolder/SITE_????-??-??.csv":
object: 'myschema.site'
single: true
env:
SAMPLE_SIZE: 2000 # increase the sample size to infer types (default=900).
SLING_STREAM_URL_COLUMN: true # adds a _sling_stream_url column with file path
SLING_LOADED_AT_COLUMN: timestamp
can you try again with latest build?
same error as before, when tried with the dev build
Version: 1.2.16.dev (2024-08-22)
~ failure running replication (see docs @ https://docs.slingdata.io/sling-cli) --------------------------- sftp://transfers2.mysftp.com/my_folder/SITE_????-??-??.csv --------------------------- ~ error listing path: "my_folder/SITE_" file does not exist
Version should be: Version 1.2.16.dev (2024-08-25)
You have to download again.
Also, remove single: true:
source: MY_SFTP
target: SNOWFLAKE
defaults:
mode: truncate
streams:
"myfolder/SITE_????-??-??.csv":
object: 'myschema.site'
single: false
env:
SAMPLE_SIZE: 2000 # increase the sample size to infer types (default=900).
SLING_STREAM_URL_COLUMN: true # adds a _sling_stream_url column with file path
SLING_LOADED_AT_COLUMN: timestamp
This works, I am able to get files now.