Soumya Ghosh
Soumya Ghosh
@kevinjqliu I would like to work on this one.
Sure @amitgilad3, most likely there will be separate PRs for each of above metadata tables. I can work on data_files, all_data_files and all_manifests
@kevinjqliu we can group the tasks in following way: * `data_files` and `delete_files` - they are subsets of `files`, just a filter condition on content field, hence can be addressed...
@kevinjqliu added PR #1066 for `data_files` and `delete_files`.
Hey @kevinjqliu, any thoughts how to implement `all_files` table? I initially thought that that `all_files` is returning files from all snapshots referenced in current table metadata and hence the repetitions...
From [spark docs](https://iceberg.apache.org/docs/latest/spark-queries/#all-metadata-tables), > These tables are unions of the metadata tables specific to the current snapshot, and return metadata across all snapshots. > The "all" metadata tables may produce...
> What if you just return all unique (data+delete) files? In this case, output will not match with Spark. Will that be okay? Also found this [PR from Iceberg](https://github.com/apache/iceberg/pull/805), >...
@kevinjqliu added PR - https://github.com/apache/iceberg-python/pull/1241 for `all_manifests`. Will get on with `all_files`, `all_data_files` and `all_delete_files` next.
Yes I will start working on that soon, have been busy last few weeks so couldn't make any progress.
sure @amitgilad3. You can work on `positional_deletes` and `all_entries`. `all_files`, `all_data_files` and `all_delete_files` will use the same base implementation and I've an approach in mind so let me give it...