vega icon indicating copy to clipboard operation
vega copied to clipboard

Should Date parsing parse integer as year by default?

Open kanitw opened this issue 5 years ago • 6 comments

Date parsing currently use integer as timestamp for parsing by default.

However, a common problem is that people actually store year as integer way more often than timestamps. While users can adjust the parse to "%Y", most people don't know they have to do so.

Thus, I wonder if we should adjust the default (or be smarter about date type inference in Vega).

cc: @jakevdp

kanitw avatar Mar 09 '19 22:03 kanitw

A couple thoughts:

  1. If we do this, doesn't it preclude the use of timestamps as inputs? (Yes, we could check the value and use timestamps for larger integers, but this seems arbitrary, brittle, and in some cases potentially infuriating.)

  2. Your suggestion would seem to boil down to changing vega-util's toDate method. FWIW, the JS built-in Date.parse maps a single number (either as a JS number, or a string) to the first day of that year. By extension, I believe Vega should do the same for strings like "2019".

  3. Integers will by default be parsed as number values (not Dates) by Vega. So for number input one would already need to specify a Date type explicitly, right? I'm assuming the primary issue is that in VL/Altair one might indicate a temporal type and have numbers treated as dates. Is that right?

  4. Is there a reason this can't be handled at the VL level?

jheer avatar Mar 09 '19 23:03 jheer

Is there a reason this can't be handled at the VL level?

Vega-Lite doesn't have access to the data, so it only knows that the data is temporal, but doesn't know how the data looks like at all.

Integers will by default be parsed as number values (not Dates) by Vega. So for number input one would already need to specify a Date type explicitly, right? I'm assuming the primary issue is that in VL/Altair one might indicate a temporal type and have numbers treated as dates. Is that right?

Yep

Your suggestion would seem to boil down to changing vega-util's toDate method. FWIW, the JS built-in Date.parse maps a single number (either as a JS number, or a string) to the first day of that year. By extension, I believe Vega should do the same for strings like "2019".

Given that we at least parse "2019" as year, it might make sense to just document that integer = timestamp by default, esp. given that a smarter logic can be too brittle and precludes timestamp inputs.

kanitw avatar Mar 10 '19 06:03 kanitw

Technically this would require a breaking change, so we should probably shelve it for the time being. That said, in the future we could consider updating the date parsing as you suggest here and require a parsing specifier of date:"%Q" for timestamp support (which I don't think is a particularly common need).

In any case, I just tested and d3.timeParse('%Q')(Date.now()) behaves as one would expect.

jheer avatar Mar 10 '19 06:03 jheer

Should we re-open and put it in 6.0 milestone then?

kanitw avatar Mar 10 '19 07:03 kanitw

I do really like the suggestion of parsing integers as years by default. As already mentioned above, I think this is a more common date storage format than integers as timestamps. It would be convenient to be able to simply specify that the data type is temporal in VL and have integers rendered as year. Doing that currently yields some odd results and one must recast the column as a string to get the desired behavior:

image

Would you consider re-opening this and implementing this feature in Vega? We could change this on the Altair side of things and recast ints as str automatically when the temporal data type is used, but would like to avoid being inconsistent with VL here is possible.

We're also discussing this in a few other places:

  • https://github.com/altair-viz/altair/discussions/3140
  • https://github.com/hex-inc/vegafusion/issues/402

joelostblom avatar Sep 29 '23 15:09 joelostblom

reopened and moved into 6.0

domoritz avatar Sep 29 '23 16:09 domoritz