qlib icon indicating copy to clipboard operation
qlib copied to clipboard

csi300成份股csi300.txt部分数据有冲突

Open 17hhh opened this issue 7 months ago • 4 comments

❓ Questions and Help

Image 通过cn_index中collector.py获取csi300指数成份股2008年到2025年的数据,发现其中部分股票的年份有误,怎么才能拿到无误的数据呢?

17hhh avatar May 08 '25 13:05 17hhh

I tried to generate csi300.txt with the command python collector.py --index_name CSI300 --qlib_dir ~/.qlib/qlib_data/cn_data --method parse_instruments, and I didn't find the problem you mentioned in it.

SunsetWolf avatar May 14 '25 07:05 SunsetWolf

Thank you for your response. I tried the command again as mentioned above. Could you please take a look at the stock SH600023? There are data for two time periods: from 6/16/2014 to 6/12/2020, and from 1/1/2005 to 5/14/2025. It's evident that these two time periods overlap. The issue would be more apparent if the stocks were sorted.

17hhh avatar May 14 '25 12:05 17hhh

I found this problem, would you like to fix it? Contributing your code is very welcome.

SunsetWolf avatar May 14 '25 13:05 SunsetWolf

I found this problem, would you like to fix it? Contributing your code is very welcome.

I have identified the cause of the issue. When retrieving announcements on changes to constituent stocks, we searched using the title “Announcement on Adjustments to the Constituents of the CSI 300 and CSI Hong Kong 100 Indices”. However, after 2022, the official CSI website changed the naming convention of these announcements, resulting in missing data for the subsequent period. Consequently, the dataset we obtained was incomplete, leading to the observed issues. I apologize that I have only identified the cause of the problem and have not yet resolved it.

17hhh avatar Aug 28 '25 07:08 17hhh