Dependency on github.com/apache/arrow-go/v18/arrow/compute greatly increases binary size
Feature Request / Improvement
We've noticed that including the compute package from arrow-go in our binary contributes ~7mb to the binary size.
At one point (bab83126b2bd5f42949b47b8299d1f28dac86e17) this package was pulled in only by the function ToRequestedSchema in arrow_utils.go, which was only used by tests. In newer versions, it appears this is now used in a few more places (arrow_scanner.go and writer.go).
It may not be possible to easily remove the dependency at this point, but just wanted to make you aware.
Perhaps it would be possible to move some of the Arrow integration stuff into a sub-package, so that depending on the main "iceberg" package (which defines types like the manifests, schema, and partition spec) doesn't pull it in as an unused transitive dependency?
It's definitely not possible to remove it as a dependency entirely as we rely on it for performing the filtering and other functionality for performing scans. Though I am surprised that the compute package alone adds that much (or is that ~7mb the entire arrow-go package?)
Perhaps it would be possible to move some of the Arrow integration stuff into a sub-package, so that depending on the main "iceberg" package (which defines types like the manifests, schema, and partition spec) doesn't pull it in as an unused transitive dependency?
Right now all of the Arrow stuff should be solely relegated to the iceberg/table package so if you're only utilizing the main iceberg package and not the table sub-package, you the compiler can exclude all of the Arrow stuff. It would be a significant chunk of work, but it might be possible to push all of the scanning and such to a sub-package which would allow most of the table operations (except scans) to not require Arrow.