iceberg
iceberg copied to clipboard
Iceberg is a table format for large, slow-moving tabular data
[Process forking options](https://docs.gradle.org/current/userguide/performance.html#forking_options). Gradle will run all tests in a single forked VM by default. This can be problematic if there are a lot of tests or some very memory-hungry...
Hi, Iceberg provides support for hadoop and Hive catalogs. However, AWS Glue doesn't provide a standard Hive service and there is no Thrift URI to access the service. Instead, the...
Please describe the amount of work required to expand this tool to supporting Azure or Google Cloud "file blob" S3-alike services. I work at a (very large) company that is...
[Travis are now recommending removing the __sudo__ tag](https://blog.travis-ci.com/2018-11-19-required-linux-infrastructure-migration). "_If you currently specify __sudo: false__ in your __.travis.yml__, we recommend removing that configuration_"
Similar to the functionality in Presto I was wondering if Glue can be substituted in as an alternative implementation of a Hive metastore. Looking at the current `HiveTableOperations` it relies...
Fixes https://github.com/Netflix/iceberg/issues/49 Initial implementation only for iceberg-api expressions Open issues: 1. should we add implementation for spark, parquet and avro? 1. should we use case sensitive startsWith? 1. should we...
It would be great to add a bunch of documentation: - Design of each part of the system (from `TableOperations` to `Expressions`, etc.) - Tradeoffs taken in the design of...
It would be useful for consumers of Iceberg tables to be able to specify additional metadata in data files that enable them to know how to read the files. Some...
We shouldn't use `Util.getFS` every time we want a `FileSystem` object in `HadoopTableOperations`. An example of where this breaks down is if file system object caching is disabled (set `fs..impl.disable.cache`)....
(this is dependent upon the completion of #71 and #72) The partition function for external mappings is derived from the parsing of the path of data files a-la Hive's format....