connect icon indicating copy to clipboard operation
connect copied to clipboard

is it possible to create pipeline on azure blob storage account for blob file ?

Open Ravi733499 opened this issue 2 years ago • 5 comments

looking for a way to read the file from azure blob storage and push it to snowflake database whenever any new file gets landed in azure blob storage container. At present it seems , existing available input component does not work in that way , it reads all the message and shut down the service

Ravi733499 avatar Jul 20 '23 20:07 Ravi733499

Hey! 👋 If you're happy to delete the files from Azure Blob Storage once they've been uploaded and it's not a huge issue to keep polling it for new files every interval, then here's a potential solution using read_until, sequence and generate:

input:
  read_until:
    # Keep reading in a loop
    check: false
    restart_input: true
    input:
      sequence:
        inputs:
          - generate:
              mapping: root = ""
              # Specify how frequently you'd like to poll Blob Storage for new
              # files
              interval: 5s
              # Need to emit 2 messages so the configured `interval` lapses once
              # because the first message is emitted immediately
              count: 2
            processors:
              # Get rid of the messages generated by this input, since we don't
              # need them
              - mapping: root = deleted()

          - azure_blob_storage:
              # Delete the read file once it's processed so it won't get picked
              # up during the next iteration
              delete_objects: true
              # TODO

output:
  snowflake_put:
    # TODO

A better solution would be to have support for Event Grid in the azure_blob_storage input similar to the SQS integration in aws_s3, but it requires some investigation. PRs are welcome.

mihaitodor avatar Jul 21 '23 01:07 mihaitodor

Can I work on it?

vivekprm avatar Jul 26 '23 12:07 vivekprm

@vivekprm Sure, like I said, PRs are welcome! Feel free to reach out if you need any insight into the code

mihaitodor avatar Jul 26 '23 13:07 mihaitodor

Sure thanks. Will go through the code and ask in case of any query.

Just to be sure we want something similar to this, where eventgrid will be used to consume files from azure_blob_storage?

vivekprm avatar Jul 26 '23 13:07 vivekprm

@vivekprm eventgrid is just responsible for message distribution from publishers to handlers. Blob Storage is one of those publishers. If you want to mimic the aws_s3 implementation with SQS, you need to consume events from one of the handlers, with Azure Queue Storage being the most similar to AWS SQS.

eduardodbr avatar Aug 13 '23 18:08 eduardodbr