csonv.js
csonv.js copied to clipboard
Request: Guessing data types
The requirement that the second row of the CSV file has to contain data types means that you can't use csonv.js for existing CSV sources (legacy API's you can't change, etcetera). Wouldn't it be nice if Csonv could guess which datatype you want based on the CSV file's contents?
For example...
name;books_owned;achievements
Alice;3.0;avidreader,commentator,spectator
Bob;1.0;ubermeister
The first column would be turned into a string, the second into a float (matches (\d+)\.(\d+)
), and the third into an array (as it is a string with comma's and no spaces). This guessing is what Excel does as well, I believe. It's not pretty, but it works most of the time.
What do you think?
Would you guess based only on the first row?
Yes, either that or soms kind of best-fitting type logic, i.e. '1.3', 'foo' becoming strings and '1', '4.2' becoming floats, if that makes any sense. The latter seems harder to implement.
Op donderdag 18 augustus 2011 schreef arexkun ( [email protected]) het volgende:
Would you guess based only on the first row?
Reply to this email directly or view it on GitHub: https://github.com/archan937/csonv.js/issues/1#issuecomment-1840224
One major problem I see with detecting the type is the fact that booleans are integers. The only way to tell them apart is if there is an empty entry. Complete type guessing is not possible when two types are indistinguishable. Other than the booleans it would actually be quite easy.
Using 'true' and 'false'/'' like YAML does instead of 1 and 0 for booleans would solve that, but it's not what the README says.
If the author wants to switch to using true/false then it's very possible to guess types.
@archan937 : Would you be willing to switch to true/false?
If they do decide to switch to true/false here's a quick function I wrote up to test data types:
var r_not_num=/\D/,
r_float=/\d*.\d+/
function type(str,t,x,l,ty){
if(~str.indexOf(",")){
l=(str=str.split(",")).length;
for(x=0;x<l;x++){
t=type(str[x])
if(t=="string") return "string"
if(t=="boolean"){
if(ty&&ty!="boolean") return "string";
ty=t;
}
if(t=="float") ty=t;
if(t=="integer"&&(!ty||ty=="integer")) ty=t;
}
return ty;
}
if(!r_not_num.test(str)){
t="integer";
}
else if(r_float.test(str)){
t="float";
}
else if(str=="true"||str=="false"){
t="boolean";
}
else
{
t="string";
}
return t;
}
Interring is nifty, but wouldn't mind if I could just pass in that line of definition as a parameter. That would still allow me to do relational lookups and specify plurals.
Being able to pass in an array as a parameter would make for a quick improvement. Something like:
var keyTypes = ["integer","string","string","strings","boolean"];
Although having the script guess the type would be the best option, being able to pass in a parameter to override the guess would be handy in the event that you want to pass an integer as a string or whatnot.