universal-data-tool icon indicating copy to clipboard operation
universal-data-tool copied to clipboard

Time Series Annotation

Open seveibar opened this issue 4 years ago • 14 comments

seveibar avatar Aug 07 '20 16:08 seveibar

To provide some context, this thread discusses a number of time series annotation tools - both open source and commercial.

I evaluated most of them, in the end for the simple task I was assigned I ended up using matplotlib, specifically, the SpanSelector widget along with RadioButtons was good enough.

In general, there are two types of time series annotations that I'm aware of, assigning labels to a specific event - i.e. the peak of heart beat - i.e. a sequence of (timestamp, label) and assigning labels to a range i.e. which activity is being performed (start,end,label).

I was specifically looking for something which handled multivariate time-series (i.e. acceleration xyz, magnetic field xyz etc). With the option to display all on the same subplot or multiple plots.

Some applications only have a single label, others have multiple labels.

A more advanced scenario involves the user being presented with a video of the activity with play/pause and say half speed mode. You need to be able to create a sync point so the timeseries and video are in step. To apply a label you'd probably want to click "on" at the start frame and "off" and the end frame.

david-waterworth avatar Aug 09 '20 00:08 david-waterworth

Hi @seveibar @

Time series annotations tools (for generating data for deep learning) aren't quite good enough yet so this might be a big opportunity for you. A good example of an existing annotation tool is github user 'Dubrzr' with their repository called 'SignalAnnotation'. I have complex electrical signals and as a user I want to annotate segments and/or points, by clicking. A good user interface for fast labeling would allow things like clicking and dragging the series left to right, shortcut keys, zooming into areas, dynamically adjusting y axis as you move along the series, an export annotations button, and also an upload annotations button would be super helpful because I could generate noisy annotations semi automatically with a preexisting functions/code and then refine them in your new time series tool :) Then i'll train an LSTM for segmentation,peak detection,classification model. Another tool out there is called 'TRAINSET' but it's too basic.

and yes @david-waterworth a video stream of the data, with the labels being in sync would be useful aswell.

Cheers and look forward to your thoughts! Sam

davodogster avatar Sep 03 '20 23:09 davodogster

Hey guys, I'd like to add few use cases and features we'd be happy to see in UDT regarding time series labeling. We are searching for a time series labeling tool that can help us in our robotics, agriculture and IOT ML projects.

Core functionality

  1. Multivariate time series
    • [Optional] Mark one series as a reference.
    • Visualizing all series on one chart or stacked charts.
    • [Optional] Possibility to show/hide series
  2. Support for multi-labels (more than 1 category)
  3. Point based labeling, range based labels, and whole-series classification
  4. Marking labeled timestamps in different color - not just time ranges (similar to trainset)

Nice to have

  1. Editable colors for labels
  2. Custom axis labels (or adding a variable in addition to the timestamp)
  3. Azure blob storage support for data import and labels export.

I'd be happy to discuss this further with you. Cheers, Dean.

dean-sh avatar Oct 09 '20 14:10 dean-sh

Hi everyone, I'm closing in on the time series interface, I was wondering if anyone could review the JSON format? Am I forgetting any use cases? https://github.com/UniversalDataTool/udt-format/blob/master/interfaces/time_series.md

seveibar avatar Oct 23 '20 20:10 seveibar

@seveibar

My use case has 3 files each containing 3 series (also 3 timestamp columns consisting of elapsed, epoch and string), along with a video. It would be nice if at a minimum you could support multiple aligned series in both the "timeData" and "csvUrl" elements (i.e. timestamp,value1,value2...valuen).

Are you saying "samples" will either contain "timeData", "csvUrl", "audioUrl" or "videoUrl"? I think you'd want at least both either "timeData" or "csvUrl" and optionally one of "audioUrl" or "videoUrl"? I think it makes sense to allow either "timeData" or "csvUrl" to appear in "samples" and "audioUrl" or "videoUrl" outside it but not quite sure?

When annotating a time series based on a video, there's usually a mechanism to mark a sync point - i.e. for a video provide a (frame,timestamp) pair which then allows you to map from another frame back to a timestamp.

Perhaps you might want to allow chart kwargs - series colour, alpha etc?

Also one thing I had to do on a recent project was to allow the annotator to zoom in - what I observed was users were turning the device on, then spending several minutes getting ready for a short experiment. So the time series was 90% nothing. I'm not sure that needs to be in the schema but it could be optional to supply initial zoom start/end values.

david-waterworth avatar Oct 23 '20 22:10 david-waterworth

@david-waterworth I totally forgot about aligned time series, I'll fix that ASAP, the draft GUI supports it!

Also w.r.t. to zoom region, totally agreed. Although this may create some data redundancy if data is repeated across samples with different zoom regions.

seveibar avatar Oct 23 '20 22:10 seveibar

@seveibar Really looking forward to using your new time series labelling tool! My data is 2 aligned time series series with millivolt recordings in milliseconds - 1 to 600,000 (no dates). I want to have the option of annotating points and also segments, multiple classes. Export as csv is ideal, with the output of input data plus an extra column for the label, else export as JSON if csv isn't possible. The ability to upload partial annotations and delete and add new ones to that would be a great feature aswell. Cheers, Sam

davodogster avatar Oct 26 '20 22:10 davodogster

A few more things:

  • Support multiple datetime formats (unix timestamp, and datetime with an option for string formatting as in here)
  • I also support having the option to define the initial zoom level of the whole series.
  • Support custom text annotation (on mouse hover on the series) - for example, show values of another variable for that timestamp.

dean-sh avatar Oct 27 '20 12:10 dean-sh

I've updated the schema to support (I think) all of the use cases! Thanks for the feedback everybody!

A couple terminology notes:

  • Durations = Segments = Time Ranges = Label + Start Time + End Time
  • Timestamps = Flags = Label + Time
  • Time can be any number, date string or duration string (thanks @dean-sh for the link), though I expect unix datetime to be most common. We will need to find one or more full RFCs to make sure that we're not reinventing the wheel with our parsing of different formats.

Some other notes:

  • initialWindow in a sample defines the initial zoom
  • Custom text labels are possible w/ allowCustomLabels for both timestamps and durations
  • Overlapping classifications on a duration or timestamp are possible, but a duration doesn't currently support multi-classification as @davodogster suggested. I consider this a UI issue rather than a format issue (the UI should identity perfectly overlapping durations as being effectively a multi-classification item). This simplifies the standard somewhat but can achieve the same effect visually.
  • @david-waterworth mentioned a sync point. I admit this is a bit outside my domain expertise and I'll have to do some research on the best way to ensure synchronization with frames while maintaining an intuitive JSON interface. We may also run into some practical issues on the web without access to ffmpeg for extracting frames. In the interest of time I think I'll introduce this feature after the initial time_series drop. It's possible we'll just need to make it a separate interface e.g. "video_duration_label".

The latest version of the spec is here but I've pasted below to keep everything in one thread.

{
  "interface": {
    "type": "time_series",
    
    // time_format determines how the time axis will be displayed to the user
    // "dates": Display as dates with time
    // "none": Display time as a number. For example, if each data point was taken at a new iteration
    //         or over a short period of time
    // "duration": Display everything relative to the first data point but converted to a time. This
                   is how a video or audio editing application might display time
                   e.g. "1:20:00" to mean "1 hour and 20 minutes past the start"
    "timeFormat": "dates",

    "enabledTools": ["create-durations", "label-durations", "create-timestamps", "label-timestamps"],

    // Can the user manually type a new label? (free text)
    "allowCustomLabels": true,

    // Labels that can be used for durations
    "durationLabels": ["buy during this time", "sell during this time"],
    
    // Labels that can be used for timestamps
    "timestampLabels": ["earnings call starts", "CEO is ousted"],
    
    // OPTIONAL: If provided, you can layer or stack graphs
    "graphs": [ { "keyName": "value" } ] // default
    
    /*
    // Here's an example where we put two pieces of data on the same plot
    
    "graphs": [
      // if two graphs share the same row, they'll be placed on top of eachother
      // if a row isn't provided, the data corresponding to the key will get it's own row
      { "keyName": "val1", "row": 0 },
      { "keyName": "val2", "row": 0 }
    ]
    */
  },
  "samples": [
    {
      "timeData": [
        { "time": 0, "value": 100 },
        { "time": 1000, "value": 50 },
        //...
        
        // You can graph any "keyName" from the "graphs" array here
        { "time": 0, "val1": 0, "val2": 0 },
        { "time": 1000, "val2": 10 },
        { "time": 2000, "val1": 100 },
        { "time": 5000, "val1": 100, "val2": 100 }
      ],
      
      // This will appear in the sample after labeling, can also be provided for viewing data
      // Times will be in the same format as the "timeData", e.g. unix epoch milliseconds
      "annotation": {
        "durations": [
          { "start": 0, "end": 500, "label": "buy here" }
        ],
        "timestamps": [
          { "time": 1000, "label": "label for 1 second mark"  }
        ]
      }
    },
    // These are also valid
    { "audioUrl": "http://example.com/audio.mp3" },
    { "videoUrl": "http://example.com/video.mp4" },
    { "csvUrl": "http://example.com/csv_with_time_and_value_columns.csv" },
    { "audioUrl": "http://example.com/audio.mp3", "initialWindow": ["10s", "30s"] },
  ]
}

Screenshot to give a sense of the initial UI

image

seveibar avatar Oct 27 '20 16:10 seveibar

This is now released! Try it out and please create new issues for feedback and ideas!!

seveibar avatar Nov 06 '20 17:11 seveibar

BIG alright! Awesome will try it out! Hopefully I can upload partial annotations and modify them and then export as csv or json.

edit: Okay looking at the Scheme above it appears we can upload annotations. Do we need to write a python script to get our data into a dictionary for every single observation?

davodogster avatar Nov 07 '20 00:11 davodogster

Hi @seveibar Happy belated New Year mate! Hope you are well How would I do this kind of time series segmentation (or point) annotation using UDT?

image

Also, it would be very useful if the y axis dynamically adjusted to the min-max values of the current window. And the user could also zoom in our out to make the current window length larger, and slide along the series and annotate it as they wish. I'm also interested in point annotations, not just segment.

Best Regards, Sam

davodogster avatar Feb 09 '21 22:02 davodogster

Hey Sam, I think that's already possible, even with the features you described! Its a bit tricky right now to import the data however, you have to put it in thr UDT format JSON OR in a compatible csv. How is the data stored? Lets try loading it in!

seveibar avatar Feb 09 '21 22:02 seveibar

Hi @seveibar I just uninstalled the Desktop version for an older version, now downloaded a newer version. Can't remember how to install it. There are so many files and non of them seem like obvious .exe / Application files . I'm on windows.

EDIT: Oh it turns out the .exe doesn't exist for some of the newer versions so there is no .exe to download. Will install a less new version

davodogster avatar Feb 09 '21 23:02 davodogster