SandDance
SandDance copied to clipboard
Manually setting categorical variables
I have a dataset that contains some column which is just an ID represented as an integer. I'd like to treat this column as a categorical one, but it is automatically classified as a quantitative one. Unfortunately, I cannot find any way to tell SandDance to consider that column as categorical. It would be a nice addition to add a way to make this kind of adjustments. And if there exists a way... giving it greater relevance/visibility would be good, as I didn't found it despite having look through the whole interface for it.
Hi @poshi, that would be a good feature to add. Are you using the GUI or do you mean programmatically?
I was using the GUI, trying some EDA. But programmatically would be good too, obviously. In fact, programatically would be somethig easier to handle, as you already have your data in a know structure that you can modify easily before feeding it to SandDance. But when you are in the GUI... there's no out.
Could you: a. have a slider in settings (default 3) that controlled a distinctness threshold above which a variable would be considered quantitative? So for example, if the threshold were 3, then 0 1 would be categorical, 0 1 2 would be categorical, but 0 1 2 3 would be Numeric.
b. Have a toggle in the Data Browser
Is this being worked on? This would be a great feature to have.
To me, setting the quantitativeness of a column would be done once after importing the data and then never touched again.
Perhaps after the data load and column inference, but before presenting the chart, a dialog could pop up that allows the user to override the inference and specify the column type and quantitativeness. (Similar to what Excel or Google Sheets does when you import CSV files)
I'm not familiar with the codebase but it seems this would be done in sanddance-app and not sanddance-explorer.
Actually @lleeoo's first option is better as it would be in sanddance-explorer and other apps embedding it would get the feature too. There's a distinctValueCount
property in ColumnStats
which could support such a feature.
So the Settings
component would examine all numeric columns' distinctValueCount and if less than the current threshold, set the column to non-quantitative.
If nobody else is taking this then I'd like to try implementing it. So far I've managed to add a very basic toggle in the Data Browser pane for numeric columns. @danmarshall do you have any ideas or preferences about this?
@nik0sc thanks for the offer! Would you mind attaching a sketch of the UI of your proposal?
@danmarshall I'm thinking of a pop up dialog where the user can choose the quantitative/categorical type for each numeric column
@nik0sc looks good! Can you ensure there's an 'revert' ability?