iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

How do I find if there is residual in the table scan/plan files?

Open maytasm opened this issue 1 year ago • 4 comments

Question

Table scan returns DataScan. I can call plan_files on DataScan to get a list of FileScanTask. I need to find if there are residual due to the filtering in any of the files? How do I do this? Thanks!

maytasm avatar Jun 01 '24 07:06 maytasm

Seems like we used to have something like https://github.com/apache/iceberg-python/commit/4f0a5c6203888ff105c1f09f41c17245f477d2ab but it's gone? @Fokko @TGooch44

maytasm avatar Jun 01 '24 20:06 maytasm

Hey @maytasm Thanks for raising this. We don't have the ResidualEvaluator today, but it would be great to add that. We can take inspiration from Java. The code that you're referring to is gone since we have build up the expression system from the ground up.

The evaluators should be already part of the codebase. Are you interested in contributing to this?

Fokko avatar Jun 03 '24 20:06 Fokko

@Fokko Thanks for getting back to me. I can look into contributing. I am not too familiar with the new pyiceberg rewrite (current state of this library) but was wondering if it would be something like porting over https://github.com/apache/iceberg-python/commit/4f0a5c6203888ff105c1f09f41c17245f477d2ab#diff-bd871c0e4a5ce5cb7edcb871e4a2b8084e44a432073c25db8b72e3ad4b94e16f ? Or do you see any blocker / difference with the old python residual evaluator and/or adding this to the FileScanTask?

maytasm avatar Jun 03 '24 20:06 maytasm

@maytasm The old evaluator might be a good starting point as it is almost a 1-to-1 copy of the Java implementation. I would double check if there are additions to the Java ResidualEvaluator in the meantime

Fokko avatar Jun 03 '24 20:06 Fokko

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Dec 01 '24 00:12 github-actions[bot]

@Fokko I am picking this up in #1223 #1388

tusharchou avatar Dec 03 '24 12:12 tusharchou

@tusharchou Thank you, I've removed the stale label 🙌

Fokko avatar Dec 03 '24 12:12 Fokko

ResidualEvaluator has been added in #1388. Closing this issue.

kevinjqliu avatar Feb 11 '25 17:02 kevinjqliu