SandDance icon indicating copy to clipboard operation
SandDance copied to clipboard

Manually setting categorical variables

Open Poshi opened this issue 4 years ago • 9 comments

I have a dataset that contains some column which is just an ID represented as an integer. I'd like to treat this column as a categorical one, but it is automatically classified as a quantitative one. Unfortunately, I cannot find any way to tell SandDance to consider that column as categorical. It would be a nice addition to add a way to make this kind of adjustments. And if there exists a way... giving it greater relevance/visibility would be good, as I didn't found it despite having look through the whole interface for it.

Poshi avatar Sep 04 '20 09:09 Poshi

Hi @poshi, that would be a good feature to add. Are you using the GUI or do you mean programmatically?

danmarshall avatar Sep 08 '20 19:09 danmarshall

I was using the GUI, trying some EDA. But programmatically would be good too, obviously. In fact, programatically would be somethig easier to handle, as you already have your data in a know structure that you can modify easily before feeding it to SandDance. But when you are in the GUI... there's no out.

Poshi avatar Sep 08 '20 20:09 Poshi

Could you: a. have a slider in settings (default 3) that controlled a distinctness threshold above which a variable would be considered quantitative? So for example, if the threshold were 3, then 0 1 would be categorical, 0 1 2 would be categorical, but 0 1 2 3 would be Numeric.

b. Have a toggle in the Data Browser

lleeoo avatar Aug 27 '21 02:08 lleeoo

Is this being worked on? This would be a great feature to have.

To me, setting the quantitativeness of a column would be done once after importing the data and then never touched again.

Perhaps after the data load and column inference, but before presenting the chart, a dialog could pop up that allows the user to override the inference and specify the column type and quantitativeness. (Similar to what Excel or Google Sheets does when you import CSV files)

I'm not familiar with the codebase but it seems this would be done in sanddance-app and not sanddance-explorer.

nik0sc avatar Aug 16 '22 18:08 nik0sc

Actually @lleeoo's first option is better as it would be in sanddance-explorer and other apps embedding it would get the feature too. There's a distinctValueCount property in ColumnStats which could support such a feature.

So the Settings component would examine all numeric columns' distinctValueCount and if less than the current threshold, set the column to non-quantitative.

nik0sc avatar Aug 16 '22 18:08 nik0sc

If nobody else is taking this then I'd like to try implementing it. So far I've managed to add a very basic toggle in the Data Browser pane for numeric columns. @danmarshall do you have any ideas or preferences about this?

nik0sc avatar Aug 17 '22 19:08 nik0sc

@nik0sc thanks for the offer! Would you mind attaching a sketch of the UI of your proposal?

danmarshall avatar Aug 17 '22 19:08 danmarshall

@danmarshall I'm thinking of a pop up dialog where the user can choose the quantitative/categorical type for each numeric column

Capture2

nik0sc avatar Aug 19 '22 16:08 nik0sc

@nik0sc looks good! Can you ensure there's an 'revert' ability?

danmarshall avatar Aug 19 '22 17:08 danmarshall