label-studio
label-studio copied to clipboard
Handling missing data in time series
Hello !
I'd like to request a feature for time series in Label Studio : I'm labelling multivariate time series and sometimes there can be some null values in the data (when a sensor did not transmit any data at a given timestamp for example). The issue is that by default Label Studio display this missing value as a 0, thus abruptly breaking the trend of the time serie. For example below is an example with 20% missing values :
It would be perfect if we had an option to choose how we want null values to be shown : as zeroes or skipped entirely. To illustrate this, I've found that Google Data Studio proposes this feature, described in the section "missing data" here : https://support.google.com/datastudio/answer/7059697?hl=en. Currently, Label Studio behaves as the "Line to Zero" option, and the one I'm looking for is the "Linear Interpolation" one ("Line Breaks" would be fine too).
Thanks !
Having the same issue in Label Studio 1.5, and this is a pretty big problem for us.
First, going to zero causes scaling issues on the y-axis.
Second, zero is semantically very different than "missing." Interpolating would solve the scale issue, but is also very semantically different from "missing." Showing the data as missing would be ideal - if we want to fill blanks, we could do that ourselves when we export to CSV.
hey guys is there any progress on this feature or is it on the near future roadmap? @makseq
or wondering @gamblard @sjw9 have you figured out a workaround?
This feature is not yet implemented, however I've created a ticket about it [LSDV-4725].
Not sure if it's a related issue but I'm also having some trouble dealing with missing data on a CSV time series. If we put "NA" or "NULL" in the CSV time series, the overview graph will not plot after the missing period.
Also, the main graph (middle part) will only plot up to the missing period. After that period, it will only plot if I zoom to a time period without any missing data.
How should I treat missing values in time series? Delete the data points (ex.: jump from 2017-03-03 to 2022-04-01)? Use "NaN"? "NULL"?
Thanks
+1 for this feature