TimeSeries.jl
TimeSeries.jl copied to clipboard
Liberate TimeArrays from TimeTypes/Vectors
I would like to suggest that it is an "issue" that timestamps are constrained to be Vector{D<:TimeType}
. Be forewarned that there is a wall of text that follows [ hopefully I am not arguing too forcefully :) ].
Arguments for relaxing TimeType
eltype restriction:
- Not all recorded times will be
TimeTypes
. Most every laboratory I have been in that develops their own equipment usually ends up rolling out one or more more custom notion of time. These may not always be easy to map onto an extant concreteTimeType
(as Stefan Karpinski recently brought up here, "time" could depend on esoteric details of the earth's rotation). It is not uncommon in these cases to represent time by some numeric primitive or a simple data structure, and it would be nice ifTimeArray
s easily accommodated this. Having said that the variousTimeType
types could still be the official supported timestamps in the package. - In a simulation context there is oftentimes no natural starting epoch from which to count so it is natural to start from one or zero. Furthermore it is common to non-dimensionalize the model equations so that the simulation "time" parameter is dimensionless.
- Sometimes you want floating point timestamps. Floating point timestamps are quite natural for many models (e.g. point processes on the real line, intervals of an adaptive ode integrator). Floating poitn types can also make sense in experimental contexts when one is merging together timestamps from data at wildly different scales (as in geophysics, cosmology, etc.).
Argument for relaxing Vector
container restriction:
The main reason is that it would allow one to encode information about the sampling in the timestamp type. A key example would be allowing for Range
-like containers:
julia> using Base.Dates
julia> TimeArray(range(now(),Second(1),5),0:4,[""])
5x1 TimeSeries.TimeArray{Int64,1,DateTime,StepRange{DateTime,Base.Dates.Second},UnitRange{Int64}} 2016-02-26T17:57:55 to 2016-02-26T17:57:59
2016-02-26T17:57:55 | 0
2016-02-26T17:57:56 | 1
2016-02-26T17:57:57 | 2
2016-02-26T17:57:58 | 3
2016-02-26T17:57:59 | 4
In this example one can tell that the data was sampled regularly at every second without the need to iterate over the whole timestamp. More importantly, it also allows one to dispatch on the timestamp container type. Many simulations and analyses can be optimized when it is known that the sampling frequency is regular. Furthermore there are some common and important function, like fft
that you would only want to call on regularly spaced data. For these reasons it makes sense to me that TimeArray
s have first class support for regularly spaced data. By keeping the type totally open, however, one could also accommodate "semiregularly" spaced data (e.g. data collected every weekday, every day but Sunday, etc.).
So these are my arguments. Assuming this looks favorable, there is an open question about implementation details, documentation and what other work this might open up. A WIP proof of concept can be found here #251.
I like this, especially the range containers - I don't see any reason not to move on that part right away. Evenly-spaced timestamp vectors could even be automatically converted to a range by constructors.
Relaxing the TimeType
constraint would be trickier, ideally there would be some sort of general subtractable (and thus also orderable) type constraint on the timestamp values. Enforcing this might involve traits or protocols, which would complicate things a little, but in principle I think it's a good idea.
I'm also interested in this feature, for numerical simulations.
Enforcing this might involve traits or protocols, which would complicate things a little, but in principle I think it's a good idea.
Could you expand on that? I haven't looked into TimeSeries
internals, but wouldn't it be sufficient to call issorted
on the input?
Yeah, I think it would - I was hoping for a type-level constraint, but that's probably not a possibility we'll see in the short term. issorted
is the approach AxisArrays takes which I'm using as a base for a timeseries-specific extension TimeAxisArrays, which addresses this issue among others. That package is still mostly undocumented (and feature-incomplete), but I'll post back here when it's more ready for primetime.
I was hoping for a type-level constraint, but that's probably not a possibility we'll see in the short term
Do you mean that we should have a type SortedVector(::Vector)
, which would enforce the ordering? It sounds reasonable to me. Base already has UpperTriangular
, Symmetric
, etc.
Ah, that's an interesting thought! Not what I had in mind, but it would probably work just as well.