lux
lux copied to clipboard
Ordinal Data Type
Overview
This PR addresses #240 by adding support for the ordinal data type. Currently, the only way to set the data type to ordinal is by using df.set_data_type({"col_name": "ordinal})
functionality. Optionally, if the entries do not have a natural ordering like number or alphabetical, a custom ordering can be specified using df.set_data_type({"col_name": "ordinal}, order={"col_name": [ordered_lst]})
. To visualize ordinal data types, we are using boxplots but because they are bivariate distributions, they only show up to enhance a selected visualization.
Changes
-
univariate.py
: allowordinal
data types to be treated asnominal
data types to create bar graphs inOccurrences
tab -
frame.py
: allow theset_data_type
function to take in optionalorder
argument to specify orders on ordinal data -
BoxPlot.py
: currently only supports Altair BoxPlots -
Compiler.py
: allow the mark to bebox
whenn_dim == 1 and n_msr == 1 and
dimension_type == "ordinal"`
Example Output

Codecov Report
Merging #360 (7820f1e) into master (1dbbcb9) will decrease coverage by
0.62%
. The diff coverage is50.00%
.
:exclamation: Current head 7820f1e differs from pull request most recent head 19a14d8. Consider uploading reports for the commit 19a14d8 to get more accurate results
@@ Coverage Diff @@
## master #360 +/- ##
==========================================
- Coverage 84.46% 83.84% -0.63%
==========================================
Files 51 52 +1
Lines 3902 3961 +59
==========================================
+ Hits 3296 3321 +25
- Misses 606 640 +34
Impacted Files | Coverage Δ | |
---|---|---|
lux/action/univariate.py | 90.38% <ø> (ø) |
|
lux/core/series.py | 53.84% <ø> (ø) |
|
lux/interestingness/interestingness.py | 87.95% <ø> (ø) |
|
lux/vislib/matplotlib/MatplotlibRenderer.py | 84.61% <0.00%> (-2.69%) |
:arrow_down: |
lux/vislib/altair/BoxPlot.py | 21.87% <21.87%> (ø) |
|
lux/vislib/altair/AltairRenderer.py | 94.59% <33.33%> (-2.59%) |
:arrow_down: |
lux/action/enhance.py | 96.87% <66.66%> (-3.13%) |
:arrow_down: |
lux/vislib/altair/BarChart.py | 82.66% <75.00%> (-2.19%) |
:arrow_down: |
lux/core/frame.py | 81.75% <81.81%> (+0.02%) |
:arrow_up: |
lux/executor/Executor.py | 79.48% <100.00%> (ø) |
|
... and 2 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 1dbbcb9...19a14d8. Read the comment docs.
Thanks @jinimukh!! Can we file a follow-up issue to delegate boxplot calculations to the Pandas and SQL Executor? This will help with performance by bringing down the rendering speed from the cost of a scatterplot to that of a boxplot (several summary statistics + outliers).
I'm wondering if ordinal data types have to be a subset of nominal data? Apart from the documentation and within the actions logic (enhance
and univariate
), is there anything in the code that treats ordinal as a subset of nominal. For example, can we capture scenarios where ordinal data type could be a subset of temporal data type? Such as {Summer, Winter, Fall}
, {Q1, Q2, …}
. It would be helpful to add an example for this.
Here's some examples that I was playing around with:
df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/aug_test.csv")
df =df.dropna(subset=['education_level',"company_size"])
df.set_data_type({'education_level': "ordinal"},
order={'education_level': ['Primary School', 'High School', 'Masters','Graduate', 'Phd']})
df["education_level"]
df.set_data_type({'company_size': "ordinal"},
order={'company_size': [
'<10', '10/49', '50-99', '100-500',
'500-999', '1000-4999', '5000-9999','10000+'
]})
df["company_size"]
I was initially a bit confused by why the boxplot was not shown for the number of records case in univariate (until we set the intent), then I realized that the boxplot didn't make sense for the ordinal data type. I wonder if it makes sense to have a bivariate ordinal data type tab, i.e., ordinal with respect to all measure values, so that the boxplot could be shown in the initial view. Otherwise, it would appear that setting the intent doesn't change anything.