ta4j
ta4j copied to clipboard
Support for time indexed time series
Time based series
The base implementation of time series uses bar indices as a mean to access data. getBat(int index)
. As shown in #382 mapping between different series sometimes is desired in which case a simple index based solution does not work. To solve this issue time series should be able to expose their indices via a timestamp. We could use a NavigableMap
to implement this feature.
public interface TimeIndexedTimeSeries extends TimeSeries {
/**
* @param time the time stamp to look up
* @return the bar which containing data for this time stamp
*/
default Bar getBar(ZonedDateTime time) {
return getBar(getIndex(time));
}
/**
* @param time the time stamp to look up
* @return the index of the series containing data for this time stamp
*/
int getIndex(ZonedDateTime time);
}
This approach will only work if the bar indices of the list do not change once data is added, or we have to update the content of the map. Your roadmap states that it's your desire to remove constrained
from the basetimeseries
. As far as I have seen data is only appended at the end of the list.
Event handler:
Referencing once again #382 I created a small converter to convert time series to a lower resolution (e.g. 5 min time series to a 15 minute or 30 minute series) this might be an interesting feature to consider, but to stay in a valid state the connected series needs to be made aware of changes to the base series. Would it make sense to implement some kind of event listeners for series to publish new data or new bar events?
Sparse time series:
I am currently working with a whole lot of OTC data sets which leads to time series being thinly populated and there are huge gaps in between consecutive bars. While this is fine for most use cases, sometimes, volume and volatility based indicators should take empty bars into account. The standard deviation of trade volume returns a widely different sentiment if 5 of the 9 potential empty 0 bars are omitted.
Of course empty bars can be added at data creation but a "native" time based sparse time series would make more sense. This gets a bit tricky during market close (weekends or holiday).
Hi @KilianB , I also thought about a map based time series implementation and this is a very welcome feature!
How is this supposed to work in between? Let's say I have timeseries with daily values. Now I want to do getBar() with timestamp that is somewhere in the middle of the day? Will I get the proper bar? Or do I have to specify exact timestamp of the start of the bar?
A good questions, I think there are arguments that can be made for both scenarios. But first we have to ask, what is the proper bar? Do we want to return the bar that starts at the given day (most likely) or the bar afterwards because this is the logical information the user can use if he would do backtesting (the opening value would have passed already).
I vote for returning the bar if the time is within the bar. This will make the overall experience much easier as the user does not have to care about seconds or minute but simply can pass a timestamp to the function. Additionally navigatable maps are made to perform this task and we don't have to perform much legwork. If you care about exact values we can add another function? or let the user manually check if the openTime == supplied time.
What is your take on it? @jurepetrovic
@KilianB I agree here. I would return the bar, "containing" the time. So for example if you have 3 bars: Bar1 from 10:00-11:00 Bar2 from 11:00-12:00 Bar3 from 12:00-13:00
getBar("10:21") --> Bar1 getBar("11:00") --> Bar2 getBar("11:59") --> Bar2
Currently when doing backtesting on multiple periods at the same time, I calculate these indexes manually. for 60 minutely indices, I add 1 hourly....
No activity for a while, so maybe this is a solved problem. I will comment anyway as I would rather the interfaces be kept lean.
Without knowing exactly what you are doing... all the dates in a bar series are strictly ordered. A simple binary search over a large bar series (I used one with ~2.5 million bars) takes fractions of millis. "I'm thinking of a number from one to the number of bars since Jan 1, 1970"
This is not tested well at all, but here is the idea... returning a Bar instead of an index is trivial
public static int barSeriesIndexOf(BarSeries bs, ZonedDateTime dateTime) {
//perhaps use getBegin/End index here; I don't get "removed bars" so I will let someone else fix this
if ( dateTime.isBefore(bs.getFirstBar().getBeginTime()) ) {
return -1;
}
if ( dateTime.isAfter(bs.getLastBar().getEndTime()) ) {
return -1;
}
return barSeriesIndexOf(bs, dateTime, 0, bs.getBarCount() - 1);
}
public static int barSeriesIndexOf(BarSeries bs, ZonedDateTime dateTime, int startIndex, int endIndex) {
int middleIndex = startIndex + endIndex / 2;
Bar bar = bs.getBar(middleIndex);
if ( bar.inPeriod(dateTime) ) {
return middleIndex;
}
else if ( dateTime.isBefore(bar.getBeginTime()) ) {
return barSeriesIndexOf(bs, dateTime, startIndex, middleIndex - 1);
}
else if ( dateTime.isAfter(bar.getEndTime()) ) {
return barSeriesIndexOf(bs, dateTime, middleIndex + 1, endIndex);
}
return -1;
}
I also thought about a map based time series implementation and this is a very welcome feature!
@team172011 Can you please tell the advantages when using "time indexed" BarSeries
instead of an "index based" BarSeries
? For the caching (https://github.com/ta4j/ta4j/pull/907https://github.com/ta4j/ta4j/pull/907), it makes sense to have something like Map<ZonedDateTime, T>
, but it's not fully clear for me why to use "time indexed" BarSeries
.