RLTrader icon indicating copy to clipboard operation
RLTrader copied to clipboard

New Data Stream with Machine learning and RL process

Open PromediaB opened this issue 5 years ago • 9 comments

I think that we should add Data stream to receive continuously new updated data, Train the model on the new data, check if there is model performance improves if yes, then we should update the model.

i found interesting post about it https://medium.com/analytics-vidhya/data-streams-and-online-machine-learning-in-python-a382e9e8d06a

PromediaB avatar Jun 07 '19 01:06 PromediaB

This is a great idea, and is actually what I planned to do for the next article. Since we will be using these algorithms to trade on Coinbase, I will be streaming the data from Coinbase and incrementally training on data as it passes (as well as "making" trades).

However, we still need a starting point before we start trading/training on live data, and I believe the current method of optimize -> train -> test will be our best bet for getting to that starting point.

notadamking avatar Jun 07 '19 03:06 notadamking

Thank you for your answer.

Want to ask you for now, im thinking to add Cron that will update the data from https://www.cryptodatadownload.com/cdd/Coinbase_BTCUSD_1h.csv every hour automatically.

There is anything on the code that will take only the new data to avoir learning from previous data or we should add it on dev ?

If yes, what is the best space where to add, train.py or optimize.py ?

PromediaB avatar Jun 07 '19 03:06 PromediaB

This is currently not set up in the code, though the best space to add it would be in train.py.

notadamking avatar Jun 07 '19 04:06 notadamking

Could this be an over-optimization? Adding an extra hour of data to a set that is 4-5 years old is a tiny improvement. And you're going to re-train every hour? If I were doing this myself I would probably retrain at most every few days, probably weekly actually.

Counter-argument is perhaps the most recent data is much more valuable than data from 5 years ago.

JohnAllen avatar Jun 07 '19 22:06 JohnAllen

@JohnAllen is correct. Re-training on each new data point would be over-zealous, and isn't likely to have much more benefit than re-training each day/few days. A live algorithm should therefore store data as it passes, and only re-train every n time steps. Training should also be done in a background thread to allow the agent to continue trading while training.

notadamking avatar Jun 08 '19 06:06 notadamking

Perfect, then we will add something on the code to play the training every 2 days on the background, and let the agent trading on Bitmex by the same time.

PromediaB avatar Jun 08 '19 15:06 PromediaB

Perfect, then we will add something on the code to play the training every 2 days on the background, and let the agent trading on Bitmex by the same time.

Did you try this in bitmex.? Any success.?

amebamcare avatar Jul 08 '19 07:07 amebamcare

What about implementing ARIMA model forecasting on current project.? Is it possible or else it may be better than the idea of deploying new live Data stream with ML over current project? thoughts?

amebamcare avatar Jul 12 '19 16:07 amebamcare

I would only extract the box transform from the arima project. The rest ist mit promising. Do they show the performance with the approach? I just found correlation and price curves in the description.

ghost avatar Jul 14 '19 06:07 ghost