TimeSeries.jl
TimeSeries.jl copied to clipboard
Shouldn't this package be called TimeArrays.jl?
It's not that important, but since this package implements a data structure called TimeArray
, shouldn't the package be called TImeArrays.jl?
Eventually (once TimeModels.jl is in better shape and integrates more tightly with this package) TimeSeries.jl could be a good name for a meta-package that reexports the TimeArray
type, time series models from TimeModels, and adds in diagnostic tests and maybe some visualization functions along the lines those in https://github.com/GordStephen/TimeSeriesTools.jl/blob/master/src/visualizations.jl That's another discussion though...
I think at one point I had that as a package name. Changing the name would create a lot of chaos. Not judging if it's worth it, just an observation.
What about merging the TimeModels.jl into TimeSeries?
That would go contrary to the initial goals of the package to remain lightweight of course. This would make a case for splitting out TimeArrays.jl as a separate package that would meet the goals of remaining lightweight.
Which brings me to your initial idea ... something worth considering.
There is also the matter that maybe TimeArrays are not the best way to represent time series in Julia. The Timestamps.jl package experiments with a different approach, but I don't think it's there yet.
Ideally, and I haven't groked how this works, TimeSeries hijacks Julia's array implementation and replaces row numbers with DateTime values. This will require hacking into the C code of course.
Thinking about it more, I like the idea of keeping TimeModels as a separate package that only operates on vanilla Arrays - but maybe a package called TimeSeries provides the "smart" glue (missing data awareness, etc) between those models and the TImeArray data structure (while the core data structure itself might be split out into its own package to keep it light).
To digress again, in that scenario, TimeSeries
would be the "flagship" time series analysis package for data processing, interactive visualization, descriptive statistics, model fitting, etc, tying together a suite of specialized packages such as TimeModels
, TimeArrays
(or whatever it would be called), etc. Visualizations and statistical tests could be part of the core or possibly split into their own packages as well. I'm also not currently aware of missing-data-robust ACF/PACF/CCF and periodogram functions in Julia - there might be an argument to include those there as well?
I recently started using this package (thanks for the great work!) and tend to agree that TimeSeries.jl
is a bad match for the current functionality in the package, which is mostly about particular data structures for working with time series and less about frameworks for doing things. I like the idea that @GordStephen mentions of renaming and/or refactoring out a TimeArrays.jl
.
Furthermore I'll say that time series representation and analysis are rather broad topics, to the extent that any "framework for working with time series" will likely serve rather particular purposes while leaving others out. This is, of course, totally fine, but in this case it would be nice to have a name that reflects the situation, as per the Julia guidelines on package names.
I'm coming around to the same view, that of factoring out TimeArrays.jl. This data structure has been able to survive a couple years, but I'm still holding out for a faster implementation.
Broader goals for TimeSeries package
- [ ] allows different data structures (TimeArray, Timestamp, DataFrame ...)
- [ ] diagnostic time series tests
- [ ] visualization
- [ ] AR modeling
- [ ] GARCH modeling
I think the "Array" part is slightly misleading here because the name mimics Base's arrays which are N-dimensional, but the objects in this package are basically 2D (the 1D case can here be considered a column matrix). You can construct a TimeArray
with higher dimension, but it doesn't makes sense since DataArray
s have columnnames
, percentchange
, and diff
julia> ts = TimeArray(collect(Date("2001-01-01"):Date("2001-01-20")), randn(20,2,2,2), ["TS1", "TS2"])
20x2 TimeSeries.TimeArray{Float64,4,Date,Array{Float64,4}} 2001-01-01 to 2001-01-20
TS1 TS2
2001-01-01 | -1.2667 0.8246
2001-01-02 | -0.3207 -0.1153
2001-01-03 | 0.3684 0.3748
2001-01-04 | 1.8648 -0.2656
⋮
2001-01-17 | 1.3288 1.9265
2001-01-18 | 1.1369 1.4096
2001-01-19 | -1.5773 -1.7937
2001-01-20 | -1.6845 0.3109
julia> ts[Date("2001-01-01")]
ERROR: column names must match width of array
in call at /Users/andreasnoack/.julia/v0.4/TimeSeries/src/timearray.jl:19
in getindex at /Users/andreasnoack/.julia/v0.4/TimeSeries/src/timearray.jl:141
in getindex at /Users/andreasnoack/.julia/v0.4/TimeSeries/src/timearray.jl:184
I've been thinking about that recently as well. One option would be to properly support N-dimensional arrays, where the first dimension is time and the rest are categorical (ideally symbols, not strings). I can see situations where this would be useful, although obviously the 2D case would be most common. As you say, the other option would be to explicitly not support higher-dimensional arrays, in which case a name like TimeFrames
or TimePanels
would make more sense.
Personally I'm partial to the first case for maximum generality, although I'm sure there are implications I haven't considered... If we were to go down that path it might make sense to build on top of something like NamedArrays.jl or AxisArrays.jl.
N-dimensional arrays would be useful for climate sciences, which often deals with time x longitude x latitude
3D arrays. Right now I'm using AxisArrays.jl for that but being able to use the tools developed here might be useful.
What's the advantage of 3D array for lat/long over simply having two columns in a 2D array?
I guess it simply involves less reshaping in the end. The raw file is usually in netCDF or GRIB, which returns 3D array natively (and 4D array in the case of 3D fields such as pressure). It's much less hassle to simply leave it at their native shaping, specifically for mapping the results, as often the mapping packages expect the results to be on a grid.
A lot of the manipulation of the simulations involves time operators, for which TimeSeries.jl would be an ideal candidate.
I'm thinking out loud here, but would it be possible to simply build an AxisArray with vectors of TimeSeries? Would there be any advantages in the end? @mbauman @timholy
I haven't worked on it in a while, but @Balinus you may find https://github.com/GordStephen/TimeAxisArrays.jl useful - it implements most of the time series functionality in this package but over AxisArrays. Currently the non-temporal dimensions are required to be categorical, but relaxing that constraint seems reasonable.
Nice, thanks @GordStephen ! I will check it out asap.
The package does not work out-of-the-box, so I'll look at it probably next week, when I have some time to look at it more deeply.
Ok - I've been wanting to give it more attention recently, so certainly don't hesitate to open issues as you run into problems - that will no doubt motivate me a bit more ;)