superset
superset copied to clipboard
[SIP-119] Table Visualizations with Time Comparison Data
[SIP-119] Proposal for Table Visualizations with Time Comparison Data
Motivation
Enhancing our visualization capabilities on the table chart to include time comparison data will help improve and facilitate efficient data analysis. Currently there are advanced analytics with time shift and comparisons in other charts. We would like to bring them into the Table chart with a few modifications.
Proposed Change
We would like to add additional columns for the time shifted value, the delta and the percentage, as well as an arrow showing direction. The extra data columns can be both collapsed and sorted. Because this table visualization has a high amount of usage, we would like to start on the development behind a feature flag CHART_PLUGINS_EXPERIMENTAL
and plan to remove the changes from the feature flag once this feature is hardened and tested.
Preview of Table viz with time comparison data:
New or Changed Public Interfaces
Changes in the UI are shown above. All new features are planned to be included behind a feature flag. We are working to use as much of the original api as possible.
New dependencies
No new dependencies
Other relevant changes
A time comparison for Big Number was also introduced recently with similar functionality.
We would like to also add this functionality to line and bar visualizations (in a separate SIP), but as those chart types already have time shift features, the improvements proposed will be changes and additions to the existing capabilities.
Migration Plan and Compatibility
No new migrations
Rejected Alternatives
Describe alternative approaches that were considered and rejected.
Thank you for the SIP @eschutho. Given that it's a WIP, I recommend checking out this issue to help work out how to display time comparisons in line charts. Time comparisons are frequently used at Airbnb and we're excited about the possible upcoming improvements. We look forward to the final version of the SIP to start the discussions.
Thanks @michael-s-molina for the reference. It looks like we're not ready for the line and bar improvements just yet, so we will open a new SIP for those visualizations and focus just on Table for now.
Thank you for the SIP @eschutho. The proposed features look very promising and will definitely improve analysis power for Superset users. Some thoughts/questions:
-
Given that we're using icons to represent time shift information, do we have tooltips explaining the meaning of them? For example, when I hover the delta icon, does it say what it is, and especially, what's the actual comparison window applied?
-
Multiple time comparisons are supported in other chart types and are frequently used. They will be very useful for table charts where users can easily compare numerical metrics over different time shifts. Could you describe how are they going to work, specifically, how are we going to differentiate the headers and display "Actual range for comparison"?
-
How "Inherit range from time filters" behave when the time filter is set to
No filter
? -
In all other chart types, time shifts are inside the Advanced Analytics panel. Here we are proposing to show them in a different section. I'm worried this will decrease UI consistency, as users frequently associate available features with the categories that the panel represent. In other words, once they associate that time shifts are inside the Advanced Analytics, they will always expect to find that feature in the same place between plugins.
-
I understand that the current implementation of time shifts is limited and not flexible as the one we are proposing here. Assuming that all plugins will adhere to this new form of configuring a time shift, and trying to avoid the situation where time shifts behave differently between plugins, would be possible to change this globally? Apart from UI consistency, I'm also thinking about the required API/query object changes and maintaining a single implementation. If we want to do this in multiple steps, I would suggest to implement the new table features using the current time shift implementation and later work on the time shift changes globally.
-
Could you expand on what are the required changes for our query API/concepts? I think this is the main point of the SIP given the many discussions happening on https://github.com/apache/superset/pull/27355 and https://github.com/apache/superset/pull/27386 about potential solutions.
-
Given that Big Number with Time Comparison was introduced without a SIP, can we apply the resulting changes of this discussion to that plugin as well to ensure consistency?
I think what's being proposed in this SIP is really valuable. Our users frequently use these features at Airbnb, and it's a great opportunity to improve our current implementation. Thank you for driving the effort.
Thanks @michael-s-molina!
Here are some answers. I am going to defer to a few other folks to help here: 1. do we have tooltips explaining the meaning of them? I'm going to ask @yousoph for help on this one.
2. Multiple time comparisons are supported in other chart types and are frequently used. Could you describe how are they going to work. The table and bar charts do not currently have time comparisons, which is why we started here, but for Bar and Line which do have time comparisons, we are looking to see how we can best improve the existing functionality and will do a separate SIP on those chart types. If we find a better way to integrate with the existing time filters, we may need to come back to this chart type and update the filters to match. Another reason why this is being developed behind a feature flag.
3. How "Inherit range from time filters" behave when the time filter is set to No filter
?
Good question. A time filter is required in order for inherit to work.
4. In all other chart types, time shifts are inside the Advanced Analytics panel. Here we are proposing to show them in a different section. When we first looked at this chart type, the time shifts didn't seem like advanced analytics. So similar answer to above, there's likely going to be some work during Bar and Line to unify these, and we will definitely have more answers about that in the next SIP.
5. I understand that the current implementation of time shifts is limited and not flexible as the one we are proposing here. ...would be possible to change this globally? Yes, that would be the goal.
6. Could you expand on what are the required changes for our query API/concepts?
The required changes for the api are actually very limited, as the advanced analytics already perform this time shift comparison. The only new addition that I noted was that we thought it would be a small improvement in the UI to show what the resulting time range would be, like this:
I've updated the sip with what the api could look like to fetch that data. For the other api changes, those aren't a result of this feature, but rather from a bug that was found in the existing visualizations that we wanted to fix. I filed a bug report for the issue found with how the data is being fetched, and we can discuss the best solution in that ticket or a separate SIP if we find that it would be helpful.
7. Given that Big Number was introduced without a SIP, can we apply the resulting changes of this discussion to that plugin as well to ensure consistency? Yes, of course.
Taking this one @eschutho :)
Here are some answers. I am going to defer to a few other folks to help here: 1. do we have tooltips explaining the meaning of them? I'm going to ask @yousoph for help on this one.
Yes, when hovering over the column headers there will be tooltips displayed -
- Multiple time comparisons are supported in other chart types and are frequently used. Could you describe how are they going to work. The table and bar charts do not currently have time comparisons, which is why we started here, but for Bar and Line which do have time comparisons, we are looking to see how we can best improve the existing functionality and will do a separate SIP on those chart types. If we find a better way to integrate with the existing time filters, we may need to come back to this chart type and update the filters to match. Another reason why this is being developed behind a feature flag.
If you populate a temporal dimension, select a corresponding time filter, and enable advanced analytics section, the all viz except legacy nvd3 charts, have readily supported time comparison, rolling, and resample. I've opened a PR for demonstrate that Table chart with Advanced Analytics.
New or Changed Public Interfaces Changes in the UI are shown above. Because we also want to show the time range in the UI, we will need to calculate that value separately. This may require a new API endpoint. All new features are planned to be included behind a feature flag. The endpoint to fetch the time range could look like this: get_relative_time_range_schema = { "type": "object", "properties": {"base_time_range": {"type": "string"}, "shift": {"type": "string"}}, }
The original /api/time_range
already support multiple time ranges request at PR, IMO, no need to introduce a seprated endpoint
If you populate a temporal dimension...
Would it work without populating such column? If not, what would be the easiest/clearest way to help user to set it properly and read the table's data if they don't want to have such column displayed?
The original
/api/time_range
already support multiple time ranges request...
IIRC the main difference it's not the multiple ranges support but rather the shift arg that's not supported by the current endpoint. If we want to pass two ranges (current and shifted) we would have to build the shifted range in the frontend as proposed here https://github.com/apache/superset/pull/27386, but as stated there, the backend will have to build the shifted ranges too at some point so we could either duplicate logic in frontend and backend or just pass the range in the api call?
Would it work without populating such column? If not, what would be the easiest/clearest way to help user to set it properly and read the table's data if they don't want to have such column displayed?
It might be a UI/UX topic, i'm not very familiar with that.
IIRC the main difference it's not the multiple ranges support but rather the shift arg that's not supported by the current endpoint. If we want to pass two ranges (current and shifted) we would have to build the shifted range in the frontend as proposed here https://github.com/apache/superset/pull/27386, but as stated there, the backend will have to build the shifted ranges too at some point so we could either duplicate logic in frontend and backend or just pass the range in the api call?
Not true, even though to build shifted time range in backend and follow all your designs, the original endpoint also easily extend to add "shift" offset, I've added comment at schema of time range
humanize_time_range = [
{"timeRange": "2021-01-01 : 2022-02-01"},
{"timeRange": "2022-01-01 : 2023-02-01", "shift": "some-time-delta"},
{"timeRange": "2022-01-01 : 2023-02-01", "shift": "some-time-delta"},
{"timeRange": "2022-01-01 : 2023-02-01", "shift": "some-time-delta"},
]
uri = f"api/v1/time_range/?q={prison.dumps(humanize_time_range)}"
the original endpoint also easily extend to add "shift" offset, I've added [comment]...
Awesome! ❤️
Thanks @zhaoyongjie. I've removed the requirement for the new api from the doc. That's great.
+1
@eschutho is this ready for a [VOTE]?
Hey @rusackas I'm going to share some general snippets we want to introduce to better help understand the proposed changes and after that it should be ready. Thanks
Extending current /time_range
API endpoint
We are going to leverage recent changes in the API that let us use arrays as input of this API and extend the schema of it. We are adding a new shift
property so the endpoint can return the adequate since
and until
with any passed shift
applied to it using the existing get_since_until
method.
superset/views/api.py
get_time_range_schema = {
"type": ["string", "array"],
"items": {
"type": "object",
"properties": {
"timeRange": {"type": "string"},
"shift": {"type": "string"},
},
},
}
...
since, until = get_since_until(
time_range=time_range["timeRange"],
time_shift=time_range.get("shift"),
)
...
Extending how time_offset
is handled in the backend
Superset doesn't support having only textual columns (dimension) when a time_offset
comes into play. Instead we are breaking the hard requirement for a temporal column as part of the queyObject
and a xaxis_label
. Also the time_grain
is mandatory as long a temporal dimensions are used.
def get_since_until_from_query_object(
...
# The condition that checked for get_xaxis_label is removed
if flt.get("op") == FilterOperator.TEMPORAL_RANGE.value and isinstance(
flt.get("val"), str
):
...
)
Since now our time_offset
API supports textual columns we might end up running a query that has no temporal column in it, in that case we cannot apply the offset to any column using the same filter values, for that particular case where no temporal column is passed we must run the query altering the filter itself so we can perform the joins later on.
superset/common/query_context_processor.py
def processing_time_offsets(
...
if not dataframe_utils.is_datetime_series(df.get(index)):
# Lets find the first temporal filter in the filters array and change
# its val to be the result of get_since_until with the offset
for flt in query_object_clone.filter:
if flt.get(
"op"
) == FilterOperator.TEMPORAL_RANGE.value and isinstance(
flt.get("val"), str
):
time_range = cast(str, flt.get("val"))
(
new_outer_from_dttm,
new_outer_to_dttm,
) = get_since_until_from_time_range(
time_range=time_range,
time_shift=offset,
)
flt["val"] = f"{new_outer_from_dttm} : {new_outer_to_dttm}"
Additionally, we are proposing the use of Full Outer Joins
as a way to solve the missing datapoints when the query uses a temporal dimension, since SQLAlchemy cannot perform a Left Join
as it is right now using ON clauses like ON queryA.temporal_column = queryB.temporal_column + time_delta
.
The usage of the Full Outer Joins
would produce more datapoints in our visualizations but generate a more correct result set.
superset/common/query_context_processor.py
def processing_time_offsets(
...
if join_keys and not time_grain:
# only do a left join if textual dimensions are the only
# thing we care about
df = dataframe_utils.left_join_df(
left_df=df,
right_df=offset_df,
join_keys=join_keys,
rsuffix=R_SUFFIX,
)
else:
# If a temporal dimension is used, we must use full outer join to
# ensure that data is not lost if not present in both dataframes
df = dataframe_utils.full_outer_join_df(
left_df=df,
right_df=offset_df,
join_keys=join_keys,
rsuffix=R_SUFFIX,
)
Adding a new and reusable control section to charts
Advanced Analytics
controls are meant to be our source of truth for visualizations that want to use time comparison, however, our main goal is to extract some of the inner controls in it to a more compact and flexible set of controls which are called Time Comparison
.
In order to not create friction for visualizations unintentionally and collect better feedback from users, we are going to use this new section in a limited set of visualizations, meaning, until the full transition occurs to this new one, some degree of duplication is expected.
superset-frontend/packages/superset-ui-chart-controls/src/sections/timeComparison.tsx
export const timeComparisonControls: (
multi?: boolean, // Multiple is not always true as in Advanced Analytics
showCalculationType?: boolean, // Some visualizations might not want to make use of the calculation type
showFullChoices?: boolean, // We can show a limited set of choices instead of the very populated one
) =>
...
) => ({
label: t('Time Comparison'),
tabOverride: 'data',
description: t('Compare results with other time periods.'),
controlSetRows: [
[
{
name: 'time_compare',
config: {
type: 'SelectControl',
multi,
freeForm: true,
placeholder: t('Select or type a custom value...'),
label: t('Time shift'),
choices: showFullChoices ? fullChoices : reducedChoices,
// We would support the new Custom Start date and Inherit form time filter
description: ...
},
},
],
[
{
name: 'start_date_offset',
config: {
type: 'TimeOffsetControl',
label: t('shift start date'),
visibility: ({ controls }) =>
controls?.time_compare.value === 'custom',
},
},
],
[
{
name: 'comparison_type',
...
visibility: () => Boolean(showCalculationType),
},
},
],
[
{
name: 'comparison_range_label',
config: {
type: 'ComparisonRangeLabel',
multi,
visibility: ({ controls }) => Boolean(controls?.time_compare.value),
},
},
],
],
});
Vote passed