[FLINK-37329][table-planner] Skip Source Stats Collection When table.optimizer.source.report-statistics-enabled is False
What is the purpose of the change
Currently when "table.optimizer.source.report-statistics-enabled" is set to false, The statistics collection is not disabled for all the cases. It was noted that when running Batch workload to read Hive table TPC-DS data set, although "table.optimizer.source.report-statistics-enabled" was set to false, both table and column statistics were being collected.
Brief change log
Skipping stats computation in FlinkRecomputeStatisticsProgram.java when "table.optimizer.source.report-statistics-enabled" is false
Verifying this change
This change is a trivial rework / code cleanup without any test coverage.
Additionally ran the following test
[INFO] [INFO] ------------------------------------------------------- [INFO] T E S T S [INFO] ------------------------------------------------------- [INFO] Running org.apache.flink.connector.file.table.FileSystemStatisticsReportTest [INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 5.578 s -- in org.apache.flink.connector.file.table.FileSystemStatisticsReportTest [INFO] [INFO] Results: [INFO] [INFO] Tests run: 17, Failures: 0, Errors: 0, Skipped: 0 [INFO] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 05:09 min [INFO] Finished at: 2025-02-15T11:51:58+05:30 [INFO] ------------------------------------------------------------------------
Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): (no)
- The public API, i.e., is any changed class annotated with
@Public(Evolving): (no) - The serializers: (no)
- The runtime per-record code paths (performance sensitive): (no)
- Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
- The S3 file system connector: (no)
Documentation
- Does this pull request introduce a new feature? (no)
- If yes, how is the feature documented? (not applicable)
CI report:
- 3b79dd61d83e7894d806901120a66067486db35b Azure: SUCCESS
- c79f478cf664484e608c11cc5b07c646abd8b829 UNKNOWN
Bot commands
The @flinkbot bot supports the following commands:@flinkbot run azurere-run the last Azure build
@reswqa @JunRuiLee @dawidwys Could you please review the changes ?
Please could you add a unit test.
Sure will add a UT for this
@davidradl - I have addressed your comments. Could you please review the same ?
@twalthr @JunRuiLee Could you please review the changes ?
@twalthr @JunRuiLee Could you please review the changes ?
Sorry @shameersss1 I am not very familiar with this part of the logic, maybe @xuyangzhong can provide some suggestions.
Thanks @JunRuiLee for the pointers. @davidradl @xuyangzhong Could you please review the changes
@dawidwys @twalthr @xuyangzhong - Gentle reminder for review
Thanks @davidradl for the review.
@JunRuiLee - Could you please point to anyone else who knows this flow and can do the review ?
getPartitionsTableStats
Thanks a lot @xuyangzhong for the review
- Yes, you are correct, it never stated it skips stats collection from catalog.
- Fetching stats from catalog may be a good option for all the cases, in some cases it is better to just turn it off.
- inorder to do the same, i propose, let;s reuse the same config and skip stats alltogether both for source and catalog or introduce a different config to do the same.
@xuyangzhong Any thoughts on the above ?
@shameersss1 Whether we modify the scope of the current configuration or introduce a new one, it's advisable to implement changes through a Flip, as these configurations are part of the public API.
This PR is being marked as stale since it has not had any activity in the last 90 days. If you would like to keep this PR alive, please leave a comment asking for a review. If the PR has merge conflicts, update it with the latest from the base branch.
If you are having difficulty finding a reviewer, please reach out to the community, contact details can be found here: https://flink.apache.org/what-is-flink/community/
If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.