sling-cli icon indicating copy to clipboard operation
sling-cli copied to clipboard

Get files from SFTP server using pattern of files

Open AashishMTech opened this issue 1 year ago • 3 comments

Feature Description

SFTP connector with pattern search using regex.

What would you like to see added to Sling?

Current sftp connector works absolutely perfect for one file i.e. abc.csv or multiple files abc_def.csv or abcdef.csv. however it does not seem to take the pattern like ^abc_\d{4}-\d{2}-\d{2}.csv$

Is there a way of solving this in sling ?

AashishMTech avatar Aug 08 '24 12:08 AashishMTech

Regex are not currently accepted, but glob is.

Can you try abc_????-??-??.csv?

flarco avatar Aug 08 '24 12:08 flarco

It does not work, here it the error, it skips after _

abc_"
file does not exist[0m
Sling command failed:

abc_*.csv works but we have some files with similar names hence we wanted more granular approach.

AashishMTech avatar Aug 08 '24 12:08 AashishMTech

here are some more similar error lines

[90m4:27PM[0m [32mINF[0m [1 / 1] running stream sftp://transfers2.vendor.com/my_company/abc_?.csv [90m4:27PM[0m [32mINF[0m connecting to target database (snowflake) [90m4:27PM[0m [32mINF[0m reading from source file system (sftp) [90m4:27PM[0m [32mINF[0m [31mexecution failed[0m [90m4:27PM[0m [32mINF[0m [31m~ error listing path: "my_company/abc_" file does not exist[0m

AashishMTech avatar Aug 08 '24 14:08 AashishMTech

Should be good now with https://github.com/slingdata-io/sling-cli/pull/344/commits/5f9d3a5e7aed59ba79c4089d16e0c1d3eae0b9b0 Feel free to build on that branch and test. Closing.

sling conns discover local -p 'sling-cli/core/dbio/iop/???.go'
+---+-----------------------------------+------+---------+--------------------------------+
| # | NAME                              | TYPE | SIZE    | LAST UPDATED (UTC)             |
+---+-----------------------------------+------+---------+--------------------------------+
| 1 | sling-cli/core/dbio/iop/csv.go    | file | 14 KiB  | 2024-04-15 14:48:13 (120d ago) |
| 2 | sling-cli/core/dbio/iop/ssh.go    | file | 8.5 KiB | 2024-08-13 10:16:20 (15h ago)  |
+---+-----------------------------------+------+---------+--------------------------------+

flarco avatar Aug 14 '24 01:08 flarco

This works when only pattern is available, like ???.csv will give asc.csv

with you we suffix or prefix it with a string it does not work, for e.g. "abc_???.csv" will not return abc_def.csv it fails saying no file found.

AashishMTech avatar Aug 15 '24 12:08 AashishMTech

OK, applied https://github.com/slingdata-io/sling-cli/pull/357/commits/ffe5a02fe541426856a3c724ce28c1b90240ce3c Can you try a dev build?

https://f.slingdata.io/dev/latest/sling_linux_amd64.tar.gz https://f.slingdata.io/dev/latest/sling_darwin_arm64.tar.gz

flarco avatar Aug 19 '24 21:08 flarco

I am using Version v1.2.16.dev

Replication not working

~ failure running replication (see docs @ https://docs.slingdata.io/sling-cli) --------------------------- sftp://transfers2.mysftp.com/my_folder/SITE_????-??-??.csv --------------------------- ~ error listing path: "my_folder/SITE_" file does not exist

looks like its skipping after _

AashishMTech avatar Aug 20 '24 13:08 AashishMTech

can you share your replication?

flarco avatar Aug 20 '24 13:08 flarco

source: MY_SFTP
target: SNOWFLAKE

defaults:
  mode: truncate


streams:
  "sftp://transfers2.mysftp.com/myfolder/SITE_????-??-??.csv":
    object: 'myschema.site'
    single: true
    
env:
  SAMPLE_SIZE: 2000 # increase the sample size to infer types (default=900).
  SLING_STREAM_URL_COLUMN: true # adds a _sling_stream_url column with file path
  SLING_LOADED_AT_COLUMN: timestamp
  

AashishMTech avatar Aug 20 '24 13:08 AashishMTech

can you try again with latest build?

flarco avatar Aug 25 '24 09:08 flarco

same error as before, when tried with the dev build

Version: 1.2.16.dev (2024-08-22)

~ failure running replication (see docs @ https://docs.slingdata.io/sling-cli) --------------------------- sftp://transfers2.mysftp.com/my_folder/SITE_????-??-??.csv --------------------------- ~ error listing path: "my_folder/SITE_" file does not exist

AashishMTech avatar Aug 26 '24 07:08 AashishMTech

Version should be: Version 1.2.16.dev (2024-08-25) You have to download again.

Also, remove single: true:

source: MY_SFTP
target: SNOWFLAKE

defaults:
  mode: truncate


streams:
  "myfolder/SITE_????-??-??.csv":
    object: 'myschema.site'
    single: false    
env:
  SAMPLE_SIZE: 2000 # increase the sample size to infer types (default=900).
  SLING_STREAM_URL_COLUMN: true # adds a _sling_stream_url column with file path
  SLING_LOADED_AT_COLUMN: timestamp

flarco avatar Aug 26 '24 09:08 flarco

This works, I am able to get files now.

AashishMTech avatar Aug 26 '24 10:08 AashishMTech