datalib icon indicating copy to clipboard operation
datalib copied to clipboard

Datalib is inferring my string as a number

Open ferndot opened this issue 7 years ago • 3 comments

Given the following TSV file, datalib is inferring the name column to be a number.

Example TSV:

owner_slug	slug	aggregate	name	square_footage
company_demo	1bf8caed-89d0-4547-b1f9-feac7d72e91b	TRUE	Restaurant 11057	3000

Datalib call:

datalib.tsv(
  {
    url: 'example.tsv'
  },
  function (error, data) {
    if (error) {
      console.log(error)
    } else {
      console.log(data)
    }
  }
)

ferndot avatar May 24 '18 05:05 ferndot

Thanks for the bug report. When I attempt to reproduce, I find that the type inference methods are inferring the name column to be a date, for which a timestamp number is then produced.

Strangely enough, the browser's built-in Date.parse method (at least on Chrome and in Node.js) successfully parses the example string value to a date:

new Date(Date.parse('Restaurant 11057'))
// Thu Jan 01 11057 00:00:00 GMT-0800 (PST)

Fixing this will likely require significant changes to how Date inference is performed (as we currently leverage the results from Date.parse). In the meantime, I recommend explicitly providing the desired column types to datalib rather than relying on type inference.

jheer avatar May 24 '18 06:05 jheer

@jheer: we could easily fix this by using Moment.js. Here is a very simple example: http://jsfiddle.net/zcvxsbo2/2/. We could also see if a more modern and small library like date-fns or d3-time-format (which is already included), would work.

This would also make the date parser more robust, consistent, and able to support more formats.

I can provide a patch if you'd like 😄

ferndot avatar Sep 27 '18 04:09 ferndot

Hi! I just wanted to see if there had been any changes to Date inference since this discussion. In Lyra, we've been loading datasets from vega-datasets through datalib and noticing a few incorrect type inferences to do with date. If there's currently no plans to revisit this issue I can potentially look into it at some point, but want to make sure I'm not duplicating the work of someone more familiar with this library first.

jonathanzong avatar Mar 26 '21 19:03 jonathanzong