torch-dataframe Using torch-dataframe for time-series

trafficstars

First of all I would like to congratulate you for this great project.

I would like to know if it is possible to use the torch-dataframe for time series study.

Something similar to xts in the R.

Example:

TimeSeries1:

| Date | Values1 |

| 2016-12-27 21:00:00 | 10.00 | | 2016-12-27 21:01:00 | 10.01 | | 2016-12-27 21:02:00 | 10.02 | | 2016-12-27 21:04:00 | 10.04 | | 2016-12-27 21:07:00 | 10.07 |

TimeSeries2:

| Date | Values2 |

| 2016-12-27 21:00:00 | 20.00 | | 2016-12-27 21:01:00 | 20.01 | | 2016-12-27 21:03:00 | 20.03 | | 2016-12-27 21:05:00 | 20.05 | | 2016-12-27 21:06:00 | 20.06 | | 2016-12-27 21:07:00 | 20.07 |

Merge result of TimeSeries1 with TimeSeries2:

| Date | Values1 | Values2 |

| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | NA | | 2016-12-27 21:03:00 | NA | 20.03 | | 2016-12-27 21:04:00 | 10.04 | NA | | 2016-12-27 21:05:00 | NA | 20.05 | | 2016-12-27 21:06:00 | NA | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |

Applying na.locf to the merged TimeSeries

| Date | Values1 | Values2 |

| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | 20.01 | | 2016-12-27 21:03:00 | 10.02 | 20.03 | | 2016-12-27 21:04:00 | 10.04 | 20.03 | | 2016-12-27 21:05:00 | 10.04 | 20.05 | | 2016-12-27 21:06:00 | 10.04 | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |

Applying na.omit to the merged TimeSeries

| Date | Values1 | Values2 |

| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |

Very Thanks

Danilo

Dec 27 '16 23:12 suporteavancado

This isn't available but should be easy to implement. All the positions of NA:s can be easily identified as they're stored as a tds.Hash with {id: true} structure, i.e. all you need to find is the id number and find the element immediately before and use that. Take a look at the Dataseries, you're welcome to add the functionality if you want to. Remember to write specs together with the functionality.

Dec 28 '16 14:12 gforge

Good afternoon,

I also believe it will be easy to implement the locf function. I will try. But as I am still learning about the project, if possible, I would be grateful if you could show me how the two timeseries would merge together according to the index column (dateTime). Just a little example if its possible.

Very thanks

Danilo

Dec 28 '16 15:12 suporteavancado

Great, start with writing a spec for the merge with the two dataframes and the desired outcome. I can then try to help you with the details of putting it together.

Dec 28 '16 16:12 gforge

require 'Dataframe'

df1 = Dataframe()

date1 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:02:00", "2016-12-27 21:04:00", "2016-12-27 21:07:00" }

value1 = { 10.00, 10.01, 10.02, 10.04, 10.07 }

df1:load_table{data=Df_Dict{date=date1, priceA=value1}}

df2 = Dataframe()

date2 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:03:00", "2016-12-27 21:05:00", "2016-12-27 21:06:00", "2016-12-27 21:07:00" }

value2 = { 20.00, 20.01, 20.03, 20.05, 20.06, 20.07 }

df2:load_table{data=Df_Dict{date=date2, priceB=value2}}

Dec 28 '16 21:12 suporteavancado

example

Dec 28 '16 22:12 suporteavancado

Ok, so you want to do a full join. There is one main problem and that is that no-matter how clever our implementation feels it will most likely be inefficient compared to other SQL-solutions that have been at it for years. My general design thought regarding torch-dataframe is to allow simple manipulations and some other stuff that's good to have for building and training models. Implementing hard-core joins has therefore not been something that I've aimed at. I personally prepare my datasets in R and then export them to CSV before importing to Torch. R has the dplyr-package that is excellent for all kinds of merges etc.

Anyway if you still want to embark on implementing the merge then:

Create a timestamp-type. Add to the Dataseries a to_timestamp as a string will be terribly inefficient to work with. I know the to_categorical function is rather slow and this will be even worse - consider doing this in C if you have large datasets. There is a SO post that may be helpful.
Create a sort for both tables using torch.sort's second return value that retrieves the indexes of the sort. Note that you will need to mask the missing data when sorting and appending the missing elements at the end.
The full_join function should
- Sort both tables on the merge key and find the number of mismatches
- Create new Dataseries of correct type (or Dataframe using add_column) that can contain the merged dataset
- Loop through both frames at once and fill in the values in the Dataseries previously created. There will be three indexes: index for first dataframe, index for second dataframe and index for the output series/frame
- Create and return a new Dataframe with the Dataseries (you could possibly start with creating a new Dataframe of correct size and setting its elements)

That's it. A few hours of work though :-P

Dec 29 '16 08:12 gforge

torch-dataframe torch-dataframe copied to clipboard

Using torch-dataframe for time-series

TimeSeries1:

| Date | Values1 |

| 2016-12-27 21:00:00 | 10.00 | | 2016-12-27 21:01:00 | 10.01 | | 2016-12-27 21:02:00 | 10.02 | | 2016-12-27 21:04:00 | 10.04 | | 2016-12-27 21:07:00 | 10.07 |

TimeSeries2:

| Date | Values2 |

| 2016-12-27 21:00:00 | 20.00 | | 2016-12-27 21:01:00 | 20.01 | | 2016-12-27 21:03:00 | 20.03 | | 2016-12-27 21:05:00 | 20.05 | | 2016-12-27 21:06:00 | 20.06 | | 2016-12-27 21:07:00 | 20.07 |

Merge result of TimeSeries1 with TimeSeries2:

| Date | Values1 | Values2 |

Applying na.locf to the merged TimeSeries

| Date | Values1 | Values2 |

Applying na.omit to the merged TimeSeries

| Date | Values1 | Values2 |

| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |

torch-dataframe
torch-dataframe copied to clipboard