pxf
pxf copied to clipboard
How can we collect analytics from PXF logs regarding the work of the pushdown predicate?
Hello,
We decided to analyze the work of the pushdown filter PXF within one cluster: we set the logging level to Debug.
We have collected the following logs (anonymized):
- Search through the master:
[[email protected] /home/gp_bibi]# zgrep -i pushdown /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-*
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:36.723 DEBUG [1708149435-0001898039:gp_bibi:176] 2194352 --- [ponse-5219] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:36.862 DEBUG [1708149435-0001898039:gp_bibi:175] 2194352 --- [ponse-5143] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:37.044 DEBUG [1708149435-0001878278:gp_bibi:174] 2194352 --- [ponse-5205] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:37.080 DEBUG [1708149435-0001898039:gp_bibi:174] 2194352 --- [ponse-5207] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:37.709 DEBUG [1708149435-0001898039:gp_bibi:176] 2194352 --- [ponse-5219] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:37.867 DEBUG [1708149435-0001878278:gp_bibi:174] 2194352 --- [ponse-5205] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:37.879 DEBUG [1708149435-0001898039:gp_bibi:175] 2194352 --- [ponse-5143] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
/gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-00-1.log.gz:2024-02-20 22:26:38.184 DEBUG [1708149435-0001898039:gp_bibi:174] 2194352 --- [ponse-5207] o.g.p.p.h.HiveAccessor : Predicate pushdown for Hive is enabled
- Across all segments:
gpssh -f ~/gpdb_configs/gp_all_hosts.hosts 'zgrep -i pushdown /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-* | grep -v enabled'
[pх-pх] /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-10-1.log.gz:2024-02-21 09:57:31.258 INFO [1708149435-0002454892:gp_bibi:166] 2959961 --- [ponse-6557] o.a.h.h.q.i.o.OrcInputFormat : ORC pushdown predicate: null
[pх-pх] /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-10-1.log.gz:2024-02-21 09:57:31.514 DEBUG [1708149435-0002454892:gp_bibi:166] 2959961 --- [ponse-6557] o.a.h.h.q.i.o.OrcInputFormat : No ORC pushdown predicate
[pх-pх] /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-10-1.log.gz:2024-02-21 09:57:31.514 DEBUG [1708149435-0002454892:gp_bibi:167] 2959961 --- [ponse-6563] o.a.h.h.q.i.o.OrcInputFormat : No ORC pushdown predicate
[pх-pх] /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-10-1.log.gz:2024-02-21 09:57:31.515 DEBUG [1708149435-0002454892:gp_bibi:165] 2959961 --- [ponse-6540] o.a.h.h.q.i.o.OrcInputFormat : No ORC pushdown predicate
[pх-pх] /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-10-1.log.gz:2024-02-21 09:57:31.235 INFO [1708149435-0002454892:gp_bibi:131] 324463 --- [ponse-6513] o.a.h.h.q.i.o.OrcInputFormat : ORC pushdown predicate: null
[pх-pх] /gp_data1/greenplum-pxf6/logs/2024-02/app-2024-02-21-10-1.log.gz:2024-02-21 09:57:31.462 DEBUG [1708149435-0002454892:gp_bibi:130] 324463 --- [ponse-6519] o.a.h.h.q.i.o.OrcInputFormat : No ORC pushdown predicate
Can we somehow analyze the usage of pushdown across the entire cluster and what is the best way to do this (through logs, through monitoring)?
Have you had cases where you had to analyze the work of PXF and FDW on a very large cluster?