superset
superset copied to clipboard
Error when using "TIME COLUMN" along with "X-AXIS" for same value for Clickhouse
When using Mixed Chart
I encountered an Error when using TIME COLUMN
along with X-AXIS
for the same value when the source is Clickhouse
.
The error looks like:
Error: :HTTPDriver for http://xx.xxx.x.84:8123 returned response code 500) Code: 215. DB::Exception: Column `dt` is not under aggregate function and not in GROUP BY. Have columns: ['count()','toStartOfDay(toDateTime(toStartOfDay(toDateTime(dt))))']: While processing toStartOfDay(toDateTime(dt)) AS dt, count() AS ...
I found a mistake in the query generation for Clickhouse:
SELECT toStartOfDay(toDateTime(`dt`)) AS `dt`,
COUNT(*) AS `count`
FROM
(SELECT dt,
id,
message
FROM default.test_table_v2) AS `virtual_table`
GROUP BY toStartOfDay(toDateTime(`dt`))
ORDER BY `count` DESC
LIMIT 10000;
The alias specified in the select SELECT toStartOfDay(toDateTime(dt)) AS dt
is then substituted for the section with grouping GROUP BY toStartOfDay(toDateTime(toStartOfDay(toDateTime(dt))))
.
How to reproduce the bug
- Create
Database
withClickhouse Connect
- Create table
CREATE TABLE IF NOT EXISTS demo(
id UInt64,
message String,
dt Date
) ENGINE = MergeTree()
ORDER BY id;
- Insert test data
INSERT INTO demo SELECT abs(toInt8(rand(2))) as id, fuzzBits('Demo',0.01) as message, '2023-02-05' as dt FROM numbers(100);
INSERT INTO demo SELECT abs(toInt8(rand(2))) as id, fuzzBits('Demo',0.01) as message, '2023-02-12' as dt FROM numbers(100);
INSERT INTO demo SELECT abs(toInt8(rand(2))) as id, fuzzBits('Demo',0.01) as message, '2023-03-01' as dt FROM numbers(100);
- Create
Dataset
based on clickhouse data - Create
Chart
based onDataset
from p. 5 with chart type `Mixed Chart
-
TIME COLUMN
=dt
-
TIME GRAIN
=Day
-
X-AXIS
=dt
- Query A
METRICS
=SUM(id)
- Query B
METRICS
=SUM(id)
see Screenshots
Expected results
In my opinion there are two ways. Or generate a query with a different alias or substitute an alias in the grouping section.
SELECT toStartOfDay(toDateTime(`dt`)) AS `dt_`,
COUNT(*) AS `count`
FROM
(SELECT dt,
id,
message
FROM default.demo) AS `virtual_table`
GROUP BY toStartOfDay(toDateTime(`dt`))
ORDER BY `count` DESC
LIMIT 10000;
SELECT toStartOfDay(toDateTime(`dt`)) AS `dt`,
COUNT(*) AS `count`
FROM
(SELECT dt,
id,
message
FROM default.demo) AS `virtual_table`
GROUP BY `dt`
ORDER BY `count` DESC
LIMIT 10000;
Screenshots
Environment
- browser type and version: Firefox/110.0
- superset version: 2.0.1
- python version: 3.8.12
- superset-helm-chart-0.8.5
- clickhouse-connect==0.5.3
Checklist
- [x] I have checked the superset logs for python stacktraces and included it here as text if there are any.
- [x] I have reproduced the issue with at least the latest released version of superset.
- [x] I have checked the issue tracker for the same issue and I haven't found one similar.
This annoying bug happens all the time. I hope it will be fixed soon.
I have it too 🙁
alias after group by should not be the same with the raw column name.
I know but such query generates Superset. I used a trick to solve this problem but it's inconvenient
I have the same issue on version 2.1.0 when trying to use timestamp as dimension in Aggregate Table.
I used a trick to solve this problem but it's inconvenient
@NikoJ can you share your workaround?
Unfortunately still here in 3.0.0.
A workaround I found is actually to edit the Date column that is related to the error to be defined as a "Custom SQL" function, like this:
Make sure that the column has the function icon after it's edited (for me i had to input a blank space and remove it before saving in above step. When it is a function, it looks like below, with the f(x) icon at the start.
This seems to mitigate the error, and so far is not producing any side effects noticed by me. Obviously not a solution worth a PR, but could be an interim solution.
EDIT:
This didn't work for another case, when I tried using a date column as the x-axis of a line chart. I fiddled around some more, and giving the column an alias withing Superset (see below) seems to be another way similar to above interim solution.
Still present in 3.0.1. The workaround above with setting a custom expression and alias works. Similar situation where we're trying to use a DateTime64 column as X axis and it's grouping by day, with inconsistent use of aliases in the generated query.
Can be fixed on ClickHouse side, with override default value for property prefer_column_name_to_alias
via user setting for default profile
<clickhouse>
<profiles>
<default>
<prefer_column_name_to_alias>1</prefer_column_name_to_alias>
....
</default>
...
Thanks @kovitals - if anyone else on this thread finds this to be a sufficient answer (or even a sufficient workaround), this would be a great documentation update on the Clickhouse docs page. A PR from anyone would be helpful on that point to close this out.
@rusackas Thanks! When I have free time, I will check it and then close this issue as solved if everything is ok.
Can be fixed on ClickHouse side, with override default value for property prefer_column_name_to_alias
via user setting for default profile
<clickhouse> <profiles> <default> <prefer_column_name_to_alias>1</prefer_column_name_to_alias> .... </default> ...
Thanks, seems to work great!
As an alternative, if someone else wants to add the setting via SQL-driven workflow, the below command sets up a new settings profile and assigns it to a user:
CREATE SETTINGS PROFILE superset SETTINGS prefer_column_name_to_alias = 1 TO *your_clickhouse_username*