datalib
datalib copied to clipboard
Datalib is inferring my string as a number
Given the following TSV file, datalib is inferring the name column to be a number.
Example TSV:
owner_slug slug aggregate name square_footage
company_demo 1bf8caed-89d0-4547-b1f9-feac7d72e91b TRUE Restaurant 11057 3000
Datalib call:
datalib.tsv(
{
url: 'example.tsv'
},
function (error, data) {
if (error) {
console.log(error)
} else {
console.log(data)
}
}
)
Thanks for the bug report. When I attempt to reproduce, I find that the type inference methods are inferring the name column to be a date, for which a timestamp number is then produced.
Strangely enough, the browser's built-in Date.parse method (at least on Chrome and in Node.js) successfully parses the example string value to a date:
new Date(Date.parse('Restaurant 11057'))
// Thu Jan 01 11057 00:00:00 GMT-0800 (PST)
Fixing this will likely require significant changes to how Date inference is performed (as we currently leverage the results from Date.parse). In the meantime, I recommend explicitly providing the desired column types to datalib rather than relying on type inference.
@jheer: we could easily fix this by using Moment.js. Here is a very simple example: http://jsfiddle.net/zcvxsbo2/2/. We could also see if a more modern and small library like date-fns or d3-time-format (which is already included), would work.
This would also make the date parser more robust, consistent, and able to support more formats.
I can provide a patch if you'd like 😄
Hi! I just wanted to see if there had been any changes to Date inference since this discussion. In Lyra, we've been loading datasets from vega-datasets through datalib and noticing a few incorrect type inferences to do with date. If there's currently no plans to revisit this issue I can potentially look into it at some point, but want to make sure I'm not duplicating the work of someone more familiar with this library first.