SandDance
SandDance copied to clipboard
Numerical values are sorted as strings
I'm using VS Code, opened the titanic csv downloaded from here https://www.openml.org/d/40945, and if I set "fare" as one of the axes, values will be all mixed up: 49.5, 5, 50, .. 512.32, 52, etc.
I think the fundamental issue is that there's no way to specify data type per column. Maybe it would work if I hand sanitized the csv first, but that seems to defeat half of the purpose of this tool: to explore unknown data sets. Would it make sense to add type casting and other sanitization features?
Edit: I used another tool - CSV Edit - to find and remove a single question mark, and now fare is recognized as numeric.
I'm having a similar issue with a CSV file containing columns of numerical values which are wrapped in "
double quotes to protect the ,
comma decimal separators which are used in my country.
There is no way to sanitize this data short of converting all of it to use .
decimal separators (removing the protective double quotes in the process), which inevitably affects how the data is displayed in visualizations as well. This again defeats most of the purpose of SandDance.
We need a way to specify column data types, urgently!
Duplicate of #270