Which type of data source should be?
❓ Questions and Help
I am so confused about whether the data source of open, close, high, and low should be adjusted or not.
I found the document saying that the data is adjusted, but if we directly place the adjusted data as a data source, why is a factor needed?
I am just confused whether I should put the adjusted close or the unadjusted close.
Stock prices may experience discontinuities under the impact of events such as dividends, rights issues, bonus issues, and capitalization issues. For example, if stock_1 with a stock price of $50, distributes a dividend of $2 per share on the ex-dividend date, the stock price will drop to $48.
To meet the data requirements for model training and data analysis, data is adjusted to maintain the continuity of historical data. There are two types of adjustments: forward adjustment and backward adjustment, and qlib recommends using backward adjustment.
The data fields provided by qlib, including open, close, high, low, volume, and change, have all undergone backward adjustment. The role of the factor is to restore the adjusted price to the original price during backtesting or other processes. For instance, close/ factor represents the original closing price.
It is recommended that you gain an in-depth understanding of concepts related to stock price adjustments.
Hi, @zhaochaofeng
Thank you for your interest in qlib. The data provided by qlib are back-adjusted data. Back-adjusted can ensure that the history of the assigned price remains unchanged, the latest day of the adjusted price and the real price may not be the same, we can not guarantee that all the source data is back-adjusted, some data may be front-adjusted, we need to convert the data to back-adjusted data, there is a way of back-adjusted, the first day of the price to 1, and the subsequent ups and downs are from the first day of the basis of the 1 up and down. factor is the adjusted factor.
Hello @zhaochaofeng So is it mean that the close, open, high, low price before putting into the data source directory should be forward adjusted or backward adjusted? And the factor column is to recover the original price. I was curious that why not put the original price into the data source, and when the data is used, to train or backtest, the system can change the open, close, high, low into adjusted price automatically.
Hi @denggit It is acceptable to store either forward-adjusted or backward-adjusted field data in the qlib file system. However, forward adjustment is not conducive to incremental data updates (as it requires adjusting historical prices based on the current date's price), which is not an issue with backward adjustment.
In theory, it is also possible to first store the original data and factors, then convert them to adjusted forms when in use.The reason why qlib file system stores adjusted data and factors is presumably for data usage efficiency. Qlib uses adjusted data more frequently, such as in model training and data analysis. Converting to adjusted forms before use is not as straightforward as directly reading pre-adjusted data.
@SunsetWolf Thank you for you explanation!
Hello @zhaochaofeng So is it mean that the close, open, high, low price before putting into the data source directory should be forward adjusted or backward adjusted? And the factor column is to recover the original price. I was curious that why not put the original price into the data source, and when the data is used, to train or backtest, the system can change the open, close, high, low into adjusted price automatically.
Thanks @denggit for your reply.
-
Qlib's official data processing: get raw data (data source is yahoo finance) -> normalize (back-adjusted is done in this step) -> dump into bin file. -
Why not use original price?
-
The original price cannot be used directly to calculate the strategy return because the return will be distorted by the ex-rights and ex-dividends.
-
Back-adjusted prices are used for strategy back-testing, return calculations, trend analysis, and to reflect true investment returns.
-