btc_data Generating technical indicators for intervals and periods

Hi,

After creating the the master BTC_Data.csv file, it needs to be broken down into the respective indicator files for the different intervals (1, 2, 3) and periods (1, 7, 30, 90 days etc). There seems to be a loose framework for the interval interval file generation in the Feature_Selection notebooks, but I just want to confirm the methodology before proceeding.

Do you already have this code in a loop that will generate each file automatically, or do the notebooks require manually editing for each iteration? If the latter, can you please clarify which lines need to be updated in Feature_Collection_reg.ipynb and Feature_Collection_cls.ipynb to generate all the different combinations of technical indicators on each run?

Thanks, J.

Feb 16 '21 16:02 jcfbeardsley

Hi, The selected features are chosen manually by looking at different things such as correlation, feature importance, train/test scores, and performance metrics and so on. It is an iterative process. You may try some of the features reported in the manuscript. But other sets of features may yield good performance as well. You may select the features in lines 70 - 72 in Feature_Collection_reg.ipynb and lines 65-67 in Feature_Collection_cls.ipynb.

Best, Mohammed Mudassir

Feb 16 '21 21:02 heliphix

Thanks for clarifying Mohammed,

Is it correct to assume that the only lines that need to be updated to change between the different different intervals and prediction timeframes are the following:

Feature_Selection_reg Feature Selection for Interval 1:

df=data.loc[interval1]

#%%

#df['priceUSD']=one.loc[interval3]

Changes to the following for Feature selection for Interval 3 at the 7 day prediction:

df=data.loc[interval3]

#%%

df['priceUSD']=seven.loc[interval3]

and swapping:

X_high.to_csv('reg_interval1.csv',sep=',',index=False)

for:

X_high.to_csv('reg_seven.csv',sep=',',index=False)

I'm first looking to reproducing your 7 day price forecast using the different models (hence the interval3/seven combination), but if it's easier to supply a copy of your notebook used to generate the reg_seven.csv files within the paper, I'm happy to work through the differences myself.

Thanks again for all your help.

Feb 25 '21 23:02 jcfbeardsley

Hi, The lines of codes and the procedure you have highlighted for choosing the interval and forecast period are correct. I have to search for that reg_seven.csv file. Nevertheless, you should be able to generate a slightly different set of features and still be able to get model performance comparable to the ones I have reported in the manuscript.

Edit: There is a reg_seven.csv file in commit b80f8913e0 as you have reported in issue #9. Please try out the codes using Python 3.6. Slight discrepancies in the selected features should not massively reduce the performance.

Feb 26 '21 10:02 heliphix

I want to offer a new point of view, and my colaboraty

Why this stock prediction project ? Things this project offers that I did not find in other free projects, are:

Testing with +-30 models. Multiple combinations features and multiple selections of models (TensorFlow , XGBoost and Sklearn ) Threshold and quality models evaluation Use 1k technical indicators Method of best features selection (technical indicators) Categorical target (do buy, do sell and do nothing) simple and dynamic, instead of continuous target variable Powerful open-market-real-time evaluation system Versatile integration with: Twitter, Telegram and Mail Train Machine Learning model with Fresh today stock data https://github.com/Leci37/stocks-prediction-Machine-learning-RealTime-telegram/tree/develop

Jan 25 '23 01:01 Leci37