iceberg-python
iceberg-python copied to clipboard
Add Files metadata table
Hi @HonahX could we get your help in triggering this workflow to see if the CI succeeds?
Sorry for now following up on this @Gowthami03B Could you rebase so we can get this in? Thanks!
@Gowthami03B gentle ping, this is the last metadata table, and we would love to include this into the release! 🙌
@Fokko @kevinjqliu @amogh-jahagirdar Can I get a re-review here please? Want to close this asap for the release timeline :)
LGTM too. @Gowthami03B Thanks for working on this! Thanks everyone for reviewing. Let's get this last metadata table in!
Hi guys, sorry if it's not the right place to ask this question.
Do you know of a viable way to speed up table.inspect.files()
for large tables?
Maybe something in mind that I could implement and contribute to upstream.
I haven't profiled yet but I guess the gist of the issue is manifest.fetch_manifest_entry
being called synchronously and sequentially in a loop.
Offloading this call to a thread-based executor doesn't help much, probably because of GIL, and a process-based executor is harder to implement because of unpicklable types involved.
As of now pyspark's .files
metatable collection can be done considerably quicker than pyiceberg's
I think there's definitely room for improvement. @DieHertz do you mind opening an issue for this?
Will do