iceberg-go icon indicating copy to clipboard operation
iceberg-go copied to clipboard

Implement Other Filesystems Using Go CDK

Open srilman opened this issue 1 year ago • 3 comments

Feature Request / Improvement

Can we add the Go CDK as a File IO option, particularly for remote write support?

For context, the Go CDK (https://gocloud.dev/) is a semi-official interface for interactive with various cloud service solutions, providing common APIs. For the purposes of this library, the blob module (https://pkg.go.dev/[email protected]/blob#pkg-overview) provides the following interfaces for object stores:

  • io/fs.FS
  • io/fs.SubFS
  • io.Writer through Bucket.NewWriter
  • io/fs.File and io.Seeker through Bucket.NewReader
  • Bucket.Delete for removing blobs

It supports the following storage backends:

  • Local Filesystem (although I wouldn't use this, the current LocalFs is simpler)
  • Memory-Based FS for testing
  • S3 for AWS Go SDK V1 and V2
  • Azure Blob
  • Google Cloud Storage

I find that this is preferable to other options like Acero because it is maintained and there are releases more often. Plus, it seems to be tied closer to the Go team.

srilman avatar Jun 02 '24 01:06 srilman

In addition, this library is one of the only ones available that supports AWS SDK v2 with write support. The library we are using right now, S3IOFs doesn't for example. And other libraries I've looked at (like VFS) only support V1.

srilman avatar Jun 02 '24 01:06 srilman

@zeroshade I think I have something working on my end, once I get a green-light happy to open a PR.

srilman avatar Jun 05 '24 17:06 srilman

@srilman I'd be happy to review a PR for this, particularly if it simplifies the file io stuff while getting us more storage back ends. I'm currently traveling for a conference, but I'll be able to review the PR next week or the week after. Thanks!

zeroshade avatar Jun 07 '24 20:06 zeroshade