streamx
streamx copied to clipboard
Add support for GCS
Add support for Google Cloud Storage.
Hi @alunarbeach This is in our roadmap and will be worked on soon. Meanwhile, its not hard to get it working. You will need these steps. Try it out if you need to hack something quickly (will be great if you can contribute too)
- Add GCS Maven dependencies
- Google has their own FileSystem impl for GCS. Look at https://cloud.google.com/hadoop/google-cloud-storage-connector
- You need to change hdfs-site.xml to include GCS specific properties and authentication details.
With the above steps, you must be able to get streamx to write to GCS.
@alunarbeach I have added GCS support, tested and pushed the changes. Look at this commit https://github.com/qubole/streamx/commit/de065fee48ff1a9cabd8e268318c8f4d99d47718. Please try it out and let us know if you see any issues.
Provide GS destination in "s3.url" config itself. Will refactor StreamX later. (You will have to use S3SinkConnector as if you are using S3 itself and just provide GS location in s3.url).
Look at https://github.com/qubole/streamx/blob/master/config/hadoop-conf/hdfs-site.xml for sample config file.
Thanks
Thanks @PraveenSeluka. will try this out soon.