vectorbt icon indicating copy to clipboard operation
vectorbt copied to clipboard

Consider resampling the ccxt data or binance data

Open XieXiaonan opened this issue 4 years ago • 4 comments

Thank you for your amazing project. I am learning this project now. But when I try to use the BinanceData or CCXTData image I found that there are only 9900 rows. however there should be 7 * 24 * 60 which is 10080 rows. That means there are some missing data from ccxt or from binance. However, doing time series analysis should first resample the datas, so how about adding resampling option for the downloaded data? Thank you again for you great work

XieXiaonan avatar Jun 29 '21 18:06 XieXiaonan

@XieXiaonan there are gaps when Binance is down. Resampling means changing the frequency of data. But you rather want to add missing data points with nan? Please elaborate.

polakowo avatar Jun 29 '21 18:06 polakowo

I mean resampling with the same frequency. And fill the missing data with previous data. For example, the data between 01:00 - 01:20 are lost dual to Binance. We can fill all the candlestick between these time with Open=High=Low=Close=Close['00:59'] Of course we can do it without vectorbt, but it is a necessary process for everyone(I believe, otherwise it make no sense for time series analysis when the data freq are changing). So I think maybe it can be add to vbt

XieXiaonan avatar Jun 29 '21 23:06 XieXiaonan

Forward filling missing data isn't always the best approach. Having hundreds of data points with the same price and volume can impact your model severely (since it doesn't know that your data is missing, it just thinks that everything stays the same and can easily overfit on those data). I would rather fill those gaps with nan so your algorithm knows there is missing data, and you as a user can then forward fill those nans if you want.

polakowo avatar Jun 30 '21 10:06 polakowo

Cool, filling with nan is also what I want

XieXiaonan avatar Jun 30 '21 13:06 XieXiaonan