qlib icon indicating copy to clipboard operation
qlib copied to clipboard

cat not dump to bin when download finished yahoo data

Open chujun-L opened this issue 1 year ago • 4 comments

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4681/4681 [8:14:52<00:00, 6.34s/it] [] 2023-10-27 04:38:09.199 | INFO | data_collector.base:_collector:195 - error symbol nums: 0 2023-10-27 04:38:09.199 | INFO | data_collector.base:_collector:196 - current get symbol nums: 4681 2023-10-27 04:38:09.199 | INFO | data_collector.base:collector_data:209 - 1 finish. 2023-10-27 04:38:09.199 | INFO | data_collector.base:collector_data:216 - total 4681, error: 0 2023-10-27 04:38:09.199 | INFO | collector:download_index_data:236 - get bench data: csi300(000300)...... 2023-10-27 04:38:14.459 | INFO | collector:download_index_data:236 - get bench data: csi100(000903)...... 2023-10-27 04:38:19.705 | INFO | collector:download_index_data:236 - get bench data: csi500(000905)...... 2023-10-27 04:38:24.962 | INFO | data_collector.utils:get_calendar_list:68 - get calendar list: ALL...... 2023-10-27 04:38:36.805 | WARNING | data_collector.utils:wrapper:491 - _get_calendar: 1 :2008-05-->Expecting value: line 1 column 1 (char 0) 2023-10-27 04:38:58.022 | INFO | data_collector.utils:get_calendar_list:106 - end of get calendar list: ALL. [43216:MainThread](2023-10-27 04:38:58,023) INFO - qlib.Initialization - [config.py:416] - default_conf: client. [43216:MainThread](2023-10-27 04:38:58,460) INFO - qlib.Initialization - [init.py:74] - qlib successfully initialized based on client settings. [43216:MainThread](2023-10-27 04:38:58,461) INFO - qlib.Initialization - [init.py:76] - data_path={'__DEFAULT_FREQ': PosixPath('/ranzhi/alex/qlib_data')} 2023-10-27 04:39:04.391 | INFO | data_collector.base:normalize:312 - normalize data...... 0%| | 0/4474 [00:01<?, ?it/s] [43216:MainThread](2023-10-27 04:39:30,975) ERROR - qlib.workflow - [utils.py:41] - An exception has been raised[TypeError: cannot convert the series to <class 'float'>]. File "scripts/data_collector/yahoo/collector.py", line 1223, in fire.Fire(Run) File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "scripts/data_collector/yahoo/collector.py", line 1198, in update_data_to_bin self.normalize_data_1d_extend(qlib_data_1d_dir) File "scripts/data_collector/yahoo/collector.py", line 1082, in normalize_data_1d_extend yc.normalize() File "/ranzhi/alex/qlib/scripts/data_collector/base.py", line 317, in normalize for _ in worker.map(self._executor, file_list): File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/process.py", line 484, in _chain_from_iterable_of_lists for element in iterable: File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator yield fs.pop().result() File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/_base.py", line 444, in result return self.__get_result() File "/ranzhi/miniconda3/envs/myqlib/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result raise self._exception TypeError: cannot convert the series to <class 'float'>

chujun-L avatar Oct 27 '23 03:10 chujun-L

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

SunsetWolf avatar Nov 17 '23 03:11 SunsetWolf

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

Excuse me, do you know why there is an error "ValueError: 2002-01-->Expecting value: line 1 column 1 (char 0)" when normalizing data downloaded from Yahoo Finance using the collector function? It seems to be a problem encountered by _get_calendar. Thanks!! 截屏2023-11-23 21 35 49

ElonJustin7 avatar Nov 23 '23 14:11 ElonJustin7

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

Excuse me, do you know why there is an error "ValueError: 2002-01-->Expecting value: line 1 column 1 (char 0)" when normalizing data downloaded from Yahoo Finance using the collector function? It seems to be a problem encountered by _get_calendar. Thanks!! 截屏2023-11-23 21 35 49

I noticed that you are asking a lot of questions at the same time, scattered in different issues, and it's hard to understand why you are doing this. My tried to normalize the data using the yahoo.collector script did not reproduce the issue you raised, and the raw data I used was from 2000 to 2023. Are you able to provide more information to help me reproduce this issue.

SunsetWolf avatar Jan 11 '24 13:01 SunsetWolf

I downloaded the data first and then normalized the data manually, and did not find the problem you describe. So I'm guessing you are using the update_data_to_bin method? This method is currently under modification, just wait until PR 1641 is merged and retry this method.

Excuse me, do you know why there is an error "ValueError: 2002-01-->Expecting value: line 1 column 1 (char 0)" when normalizing data downloaded from Yahoo Finance using the collector function? It seems to be a problem encountered by _get_calendar. Thanks!! 截屏2023-11-23 21 35 49

I noticed that you are asking a lot of questions at the same time, scattered in different issues, and it's hard to understand why you are doing this. My tried to normalize the data using the yahoo.collector script did not reproduce the issue you raised, and the raw data I used was from 2000 to 2023. Are you able to provide more information to help me reproduce this issue.

Thank you for your attention! I later resolved the issue. When downloading stock data from Yahoo Finance, I needed to use a VPN due to firewall restrictions. However, when normalizing the data, having the VPN enabled resulted in the error mentioned above. Closing the VPN allowed for successful normalization, which means whether the VPN is on or off, the update_data_to_bin method will encounter issues.

So, for the issue of being unable to access Yahoo Finance directly, my suggestion is to first enable VPN to download the data locally, and then turn off VPN before normalizing the data. :)

ElonJustin7 avatar Jan 13 '24 09:01 ElonJustin7