qlib icon indicating copy to clipboard operation
qlib copied to clipboard

ValueError: could not convert string to float: 'SH600000' when i use dump_bin.py

Open nb7123 opened this issue 1 year ago • 3 comments

🐛 Bug Description

To Reproduce

Steps to reproduce the behavior:

  1. python scripts/data_collector/baostock_5min/collector.py download_data --source_dir ~/.qlib/stock_data/source/hs300_5min_original --start 2022-01-01 --end 2022-01-30 --interval 5min --region HS300

  2. python scripts/dump_bin.py dump_all --csv_path ~/.qlib/stock_data/source/hs300_5min_original --qlib_dir ~/.qlib/qlib_data/samples

Error: """ Traceback (most recent call last): File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 198, in _process_chunk return [fn(*args) for args in chunk] File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 198, in return [fn(*args) for args in chunk] File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 264, in _dump_bin self._data_to_bin(df, calendar_list, features_dir) File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 238, in _data_to_bin np.hstack([date_index, _df[field]]).astype("<f").tofile(str(bin_path.resolve())) ValueError: could not convert string to float: 'SH600000' """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 508, in fire.Fire({"dump_all": DumpDataAll, "dump_fix": DumpDataFix, "dump_update": DumpDataUpdate}) File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 568, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 271, in call self.dump() File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 322, in dump self._dump_features() File "/Users/didi/Code/Github/qlib/scripts/dump_bin.py", line 313, in _dump_features for _ in executor.map(_dump_func, self.csv_files): File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists for element in iterable: File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.__get_result() File "/Users/didi/miniconda3/envs/qlib/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception ValueError: could not convert string to float: 'SH600000'

Screenshot

Environment

Darwin arm64 macOS-14.2-arm64-arm-64bit Darwin Kernel Version 23.2.0: Wed Nov 15 21:54:55 PST 2023; root:xnu-10002.61.3~2/RELEASE_ARM64_T8122

Python version: 3.8.19 (default, Mar 20 2024, 15:27:52) [Clang 14.0.6 ]

Qlib version: 0.9.5 numpy==1.23.5 pandas==2.0.3 scipy==1.10.1 requests==2.32.3 sacred==0.8.6 python-socketio==5.11.4 redis==5.0.8 python-redis-lock==4.0.0 schedule==1.2.2 cvxpy==1.5.2 hyperopt==0.1.2 fire==0.6.0 statsmodels==0.14.1 xlrd==2.0.1 plotly==5.24.1 matplotlib==3.7.5 tables==3.7.0 pyyaml==6.0.2 mlflow==1.14.1 tqdm==4.66.5 loguru==0.7.2 lightgbm==4.5.0 tornado==6.4.1 joblib==1.4.2 fire==0.6.0 ruamel.yaml==0.17.36

Additional Notes

nb7123 avatar Oct 12 '24 06:10 nb7123

same issue

ghyzx avatar Oct 13 '24 07:10 ghyzx

same issue

zqingr avatar Oct 15 '24 08:10 zqingr

According to the information you provided, there is a problem with your operation steps, after download_data, you need to do the normalization_data first, and then you can use dump_bin to convert the data to bin format. The documentation is here. Of course, before you do the normalization, you need to prepare a daily frequency data, the time of the daily frequency data should include the time of the normalized data.

SunsetWolf avatar Nov 29 '24 06:11 SunsetWolf