Request: Add google cloud storage source
As a replacement for logstash, we will need vector to support google cloud storage as a source similar to AWS S3 (https://vector.dev/docs/reference/configuration/sources/aws_s3/)
We do something similar in logstash
input {
google_cloud_storage {
bucket_id => "my_log_bucket"
file_matches => ".*\.log"
tags => ["server"]
codec => "json"
}
}
Discord request: https://discord.com/channels/742820443487993987/746070591097798688/847095555225944135
Is there anything on the roadmap for this source?
Not yet, but we have been experimenting with OpenDAL, which was recently used to add a WebHDFS sink, and does have support for GCS. It could be an avenue to experiment with if anyone wants to take a shot at this.
@jszwedko Would you accept a PR that implements this in roughly the same way that the aws_s3 source is implemented, i.e. via event notifications in a PubSub topic?
@jszwedko Would you accept a PR that implements this in roughly the same way that the
aws_s3source is implemented, i.e. via event notifications in a PubSub topic?
Hey! Yes, I think that would make sense as the initial implementation to match the behavior of the aws_s3 source.
Not yet, but we have been experimenting with OpenDAL, which was recently used to add a WebHDFS sink, and does have support for GCS. It could be an avenue to experiment with if anyone wants to take a shot at this.
Hi, @jszwedko. I'm willing to help implement the GCS source, but I might not have time to complete the full documentation. Do you think it's a good idea to start the implementation first? For example, all content under src/sinks/webhdfs but not website/**/webhdfs.
Not yet, but we have been experimenting with OpenDAL, which was recently used to add a WebHDFS sink, and does have support for GCS. It could be an avenue to experiment with if anyone wants to take a shot at this.
Hi, @jszwedko. I'm willing to help implement the GCS source, but I might not have time to complete the full documentation. Do you think it's a good idea to start the implementation first? For example, all content under
src/sinks/webhdfsbut notwebsite/**/webhdfs.
Hey! That'd be great! I think starting with the implementation makes sense. We can help with the docs if you get stuck.
Hey! That'd be great! I think starting with the implementation makes sense. We can help with the docs if you get stuck.
That's really appreciated. I will find some time next week to get started.
@jszwedko Would you accept a PR that implements this in roughly the same way that the
aws_s3source is implemented, i.e. via event notifications in a PubSub topic?Hey! Yes, I think that would make sense as the initial implementation to match the behavior of the
aws_s3source.
this would be fantastic - a big help to our team as we move into GCP
@jszwedko our team has been looking to enable this feature, we don't want to step on toes if this is under active development, but if not we'd love to submit a PR.
@jszwedko our team has been looking to enable this feature, we don't want to step on toes if this is under active development, but if not we'd love to submit a PR.
Hi @deangalvin-cb, we are currently not developing this integration. We would be delighted to a review a PR.
@pront https://github.com/vectordotdev/vector/pull/23916 this is a VERY early first pass, would love a look!