Use Iceberg-Rust for parsing the ManifestList and Manifests
Rationale for this change
This replaces the Cython implementation for reading Avro with Iceberg-Rust. This would greatly simplify the PyIceberg project since we don't have to publish Python wheels anymore.
Are these changes tested?
Are there any user-facing changes?
@roeap Thanks, your patch fixed correctly passing through the 102: partition field 🥳
@Fokko This is awesome! thanks for your work on this one. Do we expect any performance/memory differences with reading it with the rust module?
@yogevyuval Thanks for asking, and yes, I do expect performance impact since just a part of the deserialization is cythonized. With this change, much more is pushed into Rust. This will also reduce the GIL pressure since we don't have to build the readers anymore :)
I left this PR small to focus on the essentials, but once this is in, we can also clean up a LOT of code 👍
@yogevyuval Thanks for asking, and yes, I do expect performance impact since just a part of the deserialization is cythonized. With this change, much more is pushed into Rust. This will also reduce the GIL pressure since we don't have to build the readers anymore :)
I left this PR small to focus on the essentials, but once this is in, we can also clean up a LOT of code 👍
Interesting, we have quite a bit of concurrency going on so curious to see the results. Once https://github.com/apache/iceberg-rust/pull/1328 gets merged and release i'll try and create some basic benchmarks and share the results
Waiting for this one https://github.com/apache/iceberg-rust/issues/1587 as it will make it much easier for PyIceberg to read both versions (and less work on the PyIceberg side :)
For documentation - I tested out 0.7rc and in a manifest with 50K entries the new rust version in python took more than 2X than the original version. I'm not sure if that's because a different parsing strategy (laziness) as @Fokko suggested, or something regarding objects transefrring from rust to python, but I think we should investigate before merging, as this code path is already one of the slowest in pyiceberg
@yogevyuval That's not good news, but thanks for checking 👍 I agree that we should find out what's going on before merging this. Thanks for doing some testing, that's very much appreciated