TimeSeries.jl icon indicating copy to clipboard operation
TimeSeries.jl copied to clipboard

Shouldn't this package be called TimeArrays.jl?

Open GordStephen opened this issue 9 years ago • 12 comments

It's not that important, but since this package implements a data structure called TimeArray, shouldn't the package be called TImeArrays.jl?

Eventually (once TimeModels.jl is in better shape and integrates more tightly with this package) TimeSeries.jl could be a good name for a meta-package that reexports the TimeArray type, time series models from TimeModels, and adds in diagnostic tests and maybe some visualization functions along the lines those in https://github.com/GordStephen/TimeSeriesTools.jl/blob/master/src/visualizations.jl That's another discussion though...

GordStephen avatar Dec 16 '15 19:12 GordStephen

I think at one point I had that as a package name. Changing the name would create a lot of chaos. Not judging if it's worth it, just an observation.

What about merging the TimeModels.jl into TimeSeries?

That would go contrary to the initial goals of the package to remain lightweight of course. This would make a case for splitting out TimeArrays.jl as a separate package that would meet the goals of remaining lightweight.

Which brings me to your initial idea ... something worth considering.

There is also the matter that maybe TimeArrays are not the best way to represent time series in Julia. The Timestamps.jl package experiments with a different approach, but I don't think it's there yet.

Ideally, and I haven't groked how this works, TimeSeries hijacks Julia's array implementation and replaces row numbers with DateTime values. This will require hacking into the C code of course.

milktrader avatar Dec 18 '15 14:12 milktrader

Thinking about it more, I like the idea of keeping TimeModels as a separate package that only operates on vanilla Arrays - but maybe a package called TimeSeries provides the "smart" glue (missing data awareness, etc) between those models and the TImeArray data structure (while the core data structure itself might be split out into its own package to keep it light).

To digress again, in that scenario, TimeSeries would be the "flagship" time series analysis package for data processing, interactive visualization, descriptive statistics, model fitting, etc, tying together a suite of specialized packages such as TimeModels, TimeArrays (or whatever it would be called), etc. Visualizations and statistical tests could be part of the core or possibly split into their own packages as well. I'm also not currently aware of missing-data-robust ACF/PACF/CCF and periodogram functions in Julia - there might be an argument to include those there as well?

GordStephen avatar Dec 28 '15 17:12 GordStephen

I recently started using this package (thanks for the great work!) and tend to agree that TimeSeries.jl is a bad match for the current functionality in the package, which is mostly about particular data structures for working with time series and less about frameworks for doing things. I like the idea that @GordStephen mentions of renaming and/or refactoring out a TimeArrays.jl.

Furthermore I'll say that time series representation and analysis are rather broad topics, to the extent that any "framework for working with time series" will likely serve rather particular purposes while leaving others out. This is, of course, totally fine, but in this case it would be nice to have a name that reflects the situation, as per the Julia guidelines on package names.

gajomi avatar Feb 08 '16 08:02 gajomi

I'm coming around to the same view, that of factoring out TimeArrays.jl. This data structure has been able to survive a couple years, but I'm still holding out for a faster implementation.

Broader goals for TimeSeries package

  • [ ] allows different data structures (TimeArray, Timestamp, DataFrame ...)
  • [ ] diagnostic time series tests
  • [ ] visualization
  • [ ] AR modeling
  • [ ] GARCH modeling

milktrader avatar Feb 08 '16 14:02 milktrader

I think the "Array" part is slightly misleading here because the name mimics Base's arrays which are N-dimensional, but the objects in this package are basically 2D (the 1D case can here be considered a column matrix). You can construct a TimeArray with higher dimension, but it doesn't makes sense since DataArrays have columnnames, percentchange, and diff

julia> ts = TimeArray(collect(Date("2001-01-01"):Date("2001-01-20")), randn(20,2,2,2), ["TS1", "TS2"])
20x2 TimeSeries.TimeArray{Float64,4,Date,Array{Float64,4}} 2001-01-01 to 2001-01-20

             TS1     TS2     
2001-01-01 | -1.2667 0.8246  
2001-01-02 | -0.3207 -0.1153 
2001-01-03 | 0.3684  0.3748  
2001-01-04 | 1.8648  -0.2656 
⋮
2001-01-17 | 1.3288  1.9265  
2001-01-18 | 1.1369  1.4096  
2001-01-19 | -1.5773 -1.7937 
2001-01-20 | -1.6845 0.3109  

julia> ts[Date("2001-01-01")]
ERROR: column names must match width of array
 in call at /Users/andreasnoack/.julia/v0.4/TimeSeries/src/timearray.jl:19
 in getindex at /Users/andreasnoack/.julia/v0.4/TimeSeries/src/timearray.jl:141
 in getindex at /Users/andreasnoack/.julia/v0.4/TimeSeries/src/timearray.jl:184

andreasnoack avatar Feb 28 '16 16:02 andreasnoack

I've been thinking about that recently as well. One option would be to properly support N-dimensional arrays, where the first dimension is time and the rest are categorical (ideally symbols, not strings). I can see situations where this would be useful, although obviously the 2D case would be most common. As you say, the other option would be to explicitly not support higher-dimensional arrays, in which case a name like TimeFrames or TimePanels would make more sense.

Personally I'm partial to the first case for maximum generality, although I'm sure there are implications I haven't considered... If we were to go down that path it might make sense to build on top of something like NamedArrays.jl or AxisArrays.jl.

GordStephen avatar Feb 29 '16 04:02 GordStephen

N-dimensional arrays would be useful for climate sciences, which often deals with time x longitude x latitude 3D arrays. Right now I'm using AxisArrays.jl for that but being able to use the tools developed here might be useful.

Balinus avatar Mar 14 '17 18:03 Balinus

What's the advantage of 3D array for lat/long over simply having two columns in a 2D array?

milktrader avatar Mar 16 '17 13:03 milktrader

I guess it simply involves less reshaping in the end. The raw file is usually in netCDF or GRIB, which returns 3D array natively (and 4D array in the case of 3D fields such as pressure). It's much less hassle to simply leave it at their native shaping, specifically for mapping the results, as often the mapping packages expect the results to be on a grid.

A lot of the manipulation of the simulations involves time operators, for which TimeSeries.jl would be an ideal candidate.

I'm thinking out loud here, but would it be possible to simply build an AxisArray with vectors of TimeSeries? Would there be any advantages in the end? @mbauman @timholy

Balinus avatar Mar 16 '17 13:03 Balinus

I haven't worked on it in a while, but @Balinus you may find https://github.com/GordStephen/TimeAxisArrays.jl useful - it implements most of the time series functionality in this package but over AxisArrays. Currently the non-temporal dimensions are required to be categorical, but relaxing that constraint seems reasonable.

GordStephen avatar Mar 16 '17 14:03 GordStephen

Nice, thanks @GordStephen ! I will check it out asap.

The package does not work out-of-the-box, so I'll look at it probably next week, when I have some time to look at it more deeply.

Balinus avatar Mar 16 '17 15:03 Balinus

Ok - I've been wanting to give it more attention recently, so certainly don't hesitate to open issues as you run into problems - that will no doubt motivate me a bit more ;)

GordStephen avatar Mar 16 '17 15:03 GordStephen