ERROR: no binary output function available for type theta_sketch
PostgreSQL 15.7
Citus 12.1.-1
datasketches 1.6.0
select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 15.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
explain analyze select report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach))
from
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
from report_day where service_account_id='599267'
group by report_date, campaign_id,representative_id
) a
group by report_date, campaign_id
order by campaign_id, report_date
limit 10;
ERROR: 42883: no binary output function available for type theta_sketch
CONTEXT: while executing command on postgres-citus09:5432
LOCATION: ReportResultError, remote_commands.c:324
Time: 1248.635 ms (00:01.249)
explain select report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach))
from
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
from report_day where service_account_id='599267'
group by report_date, campaign_id,representative_id
) a
group by report_date, campaign_id
order by campaign_id, report_date
limit 10;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=757.32..757.35 rows=10 width=20)
-> Sort (cost=757.32..757.82 rows=200 width=20)
Sort Key: remote_scan.campaign_id, remote_scan.report_date
-> HashAggregate (cost=750.00..753.00 rows=200 width=20)
Group Key: remote_scan.campaign_id, remote_scan.report_date
-> Custom Scan (Citus Adaptive) (cost=0.00..0.00 rows=100000 width=44)
Task Count: 128
Tasks Shown: One of 128
-> Task
Node: host=postgres-citus02 port=5432 dbname=citus_poc
-> Subquery Scan on a (cost=97482.75..108129.48 rows=31046 width=44)
-> Finalize GroupAggregate (cost=97482.75..107819.02 rows=31046 width=52)
Group Key: report_day.report_date, report_day.campaign_id, report_day.representative_id
-> Gather Merge (cost=97482.75..106654.80 rows=62092 width=52)
Workers Planned: 2
-> Partial GroupAggregate (cost=96482.73..98487.82 rows=31046 width=52)
Group Key: report_day.report_date, report_day.campaign_id, report_day.representative_id
-> Sort (cost=96482.73..96806.13 rows=129361 width=147)
Sort Key: report_day.report_date, report_day.campaign_id, report_day.representative_id
-> Parallel Append (cost=309.83..78549.31 rows=129361 width=147)
-> Parallel Bitmap Heap Scan on report_day_p20240606_104824 report_day_1 (cost=372.08..5512.13 rows=15365 width=147)
Recheck Cond: (service_account_id = '599267'::bigint)
-> Bitmap Index Scan on report_day_p20240606_service_account_id_report_date_idx_104824 (cost=0.00..362.86 rows=36875 width=0)
Index Cond: (service_account_id = '599267'::bigint)
-> Parallel Bitmap Heap Scan on report_day_p20240608_105080 report_day_3 (cost=309.83..5381.77 rows=12555 width=147)
Recheck Cond: (service_account_id = '599267'::bigint)
-> Bitmap Index Scan on report_day_p20240608_service_account_id_report_date_idx_105080 (cost=0.00..302.29 rows=30133 width=0)
Index Cond: (service_account_id = '599267'::bigint)
....
Hi.
When using the datasketches parallel aggregate function on distributed tables in Citus, an error occurs as seen in 'explain analyze'. However, if I run a query without an analyze or use only 'explain', it works fine.
Hey @biber-baek,
Could you share the commands you used to create this distributed table so that I can try reproducing this on my end?
Any commands you used to create & distribute the table and to create the underlying column types would help a lot.
Thanks!
@onurctirtir thanks for your reply. It is reproduced by the command below.
CREATE TABLE report_day (
report_date timestamp without time zone NOT NULL,
campaign_id bigint NOT NULL,
ad_id bigint NOT NULL,
account_id bigint NOT NULL,
sketch_col public.theta_sketch
);
select create_distributed_table('report_day','ad_id');
insert into report_day select now() - (g%10 || 'day')::interval, g, trunc(random() * 100 + 1), g%10, (select public.theta_sketch_build(1)) from generate_series(1, 100) g;
select report_date, public.theta_sketch_union(sketch_col) as reach
from report_day where account_id='7'
group by report_date;
Could you please also share the definition of theta_sketch type?
I am using the datasketches extension to calculate approximate values in analytical work. The theta_sketch type is a type provided by datasketches.
[local] citus@citus:9700-18493=# explain analyze select report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach))
from
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
from report_day where service_account_id='599267'
group by report_date, campaign_id,representative_id
) a
group by report_date, campaign_id
order by campaign_id, report_date
limit 10;
ERROR: column "reach_ds" does not exist
LINE 3: ...elect report_date,campaign_id, theta_sketch_union(reach_ds) ...
^
Time: 0.233 ms
[local] citus@citus:9700-18493=# explain select report_date, campaign_id, theta_sketch_get_estimate(theta_sketch_union(reach))
from
(select report_date,campaign_id, theta_sketch_union(reach_ds) as reach
from report_day where service_account_id='599267'
group by report_date, campaign_id,representative_id
) a
group by report_date, campaign_id
order by campaign_id, report_date
limit 10;
ERROR: column "reach_ds" does not exist
LINE 3: ...elect report_date,campaign_id, theta_sketch_union(reach_ds) ...
^
Time: 0.217 ms
@biber-baek When I attempted to reproduce the issue, I encountered the same error mentioned above. Could there be a missing step in the reproduction process?