label-studio icon indicating copy to clipboard operation
label-studio copied to clipboard

Handling missing data in time series

Open gamblard opened this issue 3 years ago • 6 comments

Hello !

I'd like to request a feature for time series in Label Studio : I'm labelling multivariate time series and sometimes there can be some null values in the data (when a sensor did not transmit any data at a given timestamp for example). The issue is that by default Label Studio display this missing value as a 0, thus abruptly breaking the trend of the time serie. For example below is an example with 20% missing values : image

It would be perfect if we had an option to choose how we want null values to be shown : as zeroes or skipped entirely. To illustrate this, I've found that Google Data Studio proposes this feature, described in the section "missing data" here : https://support.google.com/datastudio/answer/7059697?hl=en. Currently, Label Studio behaves as the "Line to Zero" option, and the one I'm looking for is the "Linear Interpolation" one ("Line Breaks" would be fine too).

Thanks !

gamblard avatar Mar 22 '21 09:03 gamblard

Having the same issue in Label Studio 1.5, and this is a pretty big problem for us.

First, going to zero causes scaling issues on the y-axis.

Second, zero is semantically very different than "missing." Interpolating would solve the scale issue, but is also very semantically different from "missing." Showing the data as missing would be ideal - if we want to fill blanks, we could do that ourselves when we export to CSV.

sjw9 avatar Oct 17 '22 12:10 sjw9

hey guys is there any progress on this feature or is it on the near future roadmap? @makseq

or wondering @gamblard @sjw9 have you figured out a workaround?

AndyYSWoo avatar Mar 06 '23 03:03 AndyYSWoo

This feature is not yet implemented, however I've created a ticket about it [LSDV-4725].

makseq avatar Mar 09 '23 01:03 makseq

Not sure if it's a related issue but I'm also having some trouble dealing with missing data on a CSV time series. If we put "NA" or "NULL" in the CSV time series, the overview graph will not plot after the missing period.

image

Also, the main graph (middle part) will only plot up to the missing period. After that period, it will only plot if I zoom to a time period without any missing data.

image

How should I treat missing values in time series? Delete the data points (ex.: jump from 2017-03-03 to 2022-04-01)? Use "NaN"? "NULL"?

Thanks

dvictori avatar May 12 '23 14:05 dvictori

+1 for this feature

Zahorack avatar Nov 18 '23 21:11 Zahorack