torch-dataframe
torch-dataframe copied to clipboard
Using torch-dataframe for time-series
First of all I would like to congratulate you for this great project.
I would like to know if it is possible to use the torch-dataframe for time series study.
Something similar to xts in the R.
Example:
TimeSeries1:
| Date | Values1 |
| 2016-12-27 21:00:00 | 10.00 | | 2016-12-27 21:01:00 | 10.01 | | 2016-12-27 21:02:00 | 10.02 | | 2016-12-27 21:04:00 | 10.04 | | 2016-12-27 21:07:00 | 10.07 |
TimeSeries2:
| Date | Values2 |
| 2016-12-27 21:00:00 | 20.00 | | 2016-12-27 21:01:00 | 20.01 | | 2016-12-27 21:03:00 | 20.03 | | 2016-12-27 21:05:00 | 20.05 | | 2016-12-27 21:06:00 | 20.06 | | 2016-12-27 21:07:00 | 20.07 |
Merge result of TimeSeries1 with TimeSeries2:
| Date | Values1 | Values2 |
| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | NA | | 2016-12-27 21:03:00 | NA | 20.03 | | 2016-12-27 21:04:00 | 10.04 | NA | | 2016-12-27 21:05:00 | NA | 20.05 | | 2016-12-27 21:06:00 | NA | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |
Applying na.locf to the merged TimeSeries
| Date | Values1 | Values2 |
| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:02:00 | 10.02 | 20.01 | | 2016-12-27 21:03:00 | 10.02 | 20.03 | | 2016-12-27 21:04:00 | 10.04 | 20.03 | | 2016-12-27 21:05:00 | 10.04 | 20.05 | | 2016-12-27 21:06:00 | 10.04 | 20.06 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |
Applying na.omit to the merged TimeSeries
| Date | Values1 | Values2 |
| 2016-12-27 21:00:00 | 10.00 | 20.00 | | 2016-12-27 21:01:00 | 10.01 | 20.01 | | 2016-12-27 21:07:00 | 10.07 | 20.07 |
Very Thanks
Danilo
This isn't available but should be easy to implement. All the positions of NA:s can be easily identified as they're stored as a tds.Hash with {id: true} structure, i.e. all you need to find is the id number and find the element immediately before and use that. Take a look at the Dataseries, you're welcome to add the functionality if you want to. Remember to write specs together with the functionality.
Good afternoon,
I also believe it will be easy to implement the locf function. I will try. But as I am still learning about the project, if possible, I would be grateful if you could show me how the two timeseries would merge together according to the index column (dateTime). Just a little example if its possible.
Very thanks
Danilo
Great, start with writing a spec for the merge with the two dataframes and the desired outcome. I can then try to help you with the details of putting it together.
require 'Dataframe'
df1 = Dataframe()
date1 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:02:00", "2016-12-27 21:04:00", "2016-12-27 21:07:00" }
value1 = { 10.00, 10.01, 10.02, 10.04, 10.07 }
df1:load_table{data=Df_Dict{date=date1, priceA=value1}}
df2 = Dataframe()
date2 = { "2016-12-27 21:00:00", "2016-12-27 21:01:00", "2016-12-27 21:03:00", "2016-12-27 21:05:00", "2016-12-27 21:06:00", "2016-12-27 21:07:00" }
value2 = { 20.00, 20.01, 20.03, 20.05, 20.06, 20.07 }
df2:load_table{data=Df_Dict{date=date2, priceB=value2}}

Ok, so you want to do a full join. There is one main problem and that is that no-matter how clever our implementation feels it will most likely be inefficient compared to other SQL-solutions that have been at it for years. My general design thought regarding torch-dataframe is to allow simple manipulations and some other stuff that's good to have for building and training models. Implementing hard-core joins has therefore not been something that I've aimed at. I personally prepare my datasets in R and then export them to CSV before importing to Torch. R has the dplyr-package that is excellent for all kinds of merges etc.
Anyway if you still want to embark on implementing the merge then:
- Create a timestamp-type. Add to the Dataseries a
to_timestampas a string will be terribly inefficient to work with. I know the to_categorical function is rather slow and this will be even worse - consider doing this in C if you have large datasets. There is a SO post that may be helpful. - Create a
sortfor both tables usingtorch.sort's second return value that retrieves the indexes of the sort. Note that you will need to mask the missing data when sorting and appending the missing elements at the end. - The
full_joinfunction should- Sort both tables on the merge key and find the number of mismatches
- Create new Dataseries of correct type (or Dataframe using
add_column) that can contain the merged dataset - Loop through both frames at once and fill in the values in the Dataseries previously created. There will be three indexes: index for first dataframe, index for second dataframe and index for the output series/frame
- Create and return a new Dataframe with the Dataseries (you could possibly start with creating a new Dataframe of correct size and setting its elements)
That's it. A few hours of work though :-P