qlib
qlib copied to clipboard
Update yahoo data to binary error
🐛 Bug Description
I want to automatic update of daily frequency data, I write a crontab but it doesn't work, so I manually test with the following command
python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir /data/qlib/qlib_data/cn_data --trading_date 2021-06-11 --end_date 2022-01-15
I got
2022-01-19 16:08:14.819 | INFO | collector:get_instrument_list:196 - get HS stock symbols......
2022-01-19 16:08:27.547 | INFO | collector:get_instrument_list:198 - get 4323 symbols.
2022-01-19 16:08:27.558 | INFO | data_collector.base:collector_data:204 - start collector data......
2022-01-19 16:08:27.558 | INFO | data_collector.base:collector_data:209 - getting data: 1
0%| | 0/4323 [00:00<?, ?it/s]2022-01-19 16:08:30.050 | WARNING | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:08:30.051 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:09.459 | WARNING | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:09.459 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:11.971 | WARNING | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:11.972 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:16.339 | WARNING | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:16.340 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:18.811 | WARNING | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:18.812 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:18.812 | WARNING | data_collector.base:save_instrument:164 - 000001.sz is empty
0%| | 1/4323 [00:51<61:31:57, 51.25s/it]2022-01-19 16:09:21.320 | WARNING | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:21.321 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:23.824 | WARNING | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:23.825 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:27.271 | WARNING | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:27.272 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:29.752 | WARNING | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:29.753 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:32.478 | WARNING | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:32.479 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:32.480 | WARNING | data_collector.base:save_instrument:164 - 000002.sz is empty
0%| | 2/4323 [01:04<34:58:51, 29.14s/it]2022-01-19 16:09:34.966 | WARNING | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:34.967 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:37.443 | WARNING | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:37.444 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:39.923 | WARNING | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:39.924 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:42.416 | WARNING | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:42.417 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:44.887 | WARNING | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:44.888 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:44.888 | WARNING | data_collector.base:save_instrument:164 - 000004.sz is empty
0%| | 3/4323 [01:17<25:48:09, 21.50s/it]2022-01-19 16:09:47.383 | WARNING | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:47.383 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:49.907 | WARNING | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:49.908 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:52.393 | WARNING | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:52.394 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:54.862 | WARNING | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:54.863 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:57.343 | WARNING | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:57.344 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:57.345 | WARNING | data_collector.base:save_instrument:164 - 000005.sz is empty
0%|▏ | 4/4323 [01:29<21:30:44, 17.93s/it]2022-01-19 16:09:59.847 | WARNING | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:09:59.848 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:02.340 | WARNING | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:02.341 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:04.816 | WARNING | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:04.817 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:07.303 | WARNING | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:07.304 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:10.044 | WARNING | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:10.045 | WARNING | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:10.045 | WARNING | data_collector.base:save_instrument:164 - 000006.sz is empty
0%|▏ | 5/4323 [01:42<19:14:42, 16.04s/it]2022-01-19 16:10:53.199 | WARNING | collector:get_data_from_remote:141 - 000007.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:HTTPSConnectionPool(host='query2.finance.yahoo.com', port=443): Max retries exceeded with url: /v8/finance/chart/000007.sz?period1=1623340800&period2=1642176000&interval=1d&events=div%2Csplit&formatted=false&lang=en-US®ion=US&corsDomain=finance.yahoo.com (Caused by ReadTimeoutError("HTTPSConnectionPool(host='query2.finance.yahoo.com', port=443): Read timed out. (read timeout=5)"))
To Reproduce
Steps to reproduce the behavior:
1.python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir /data/qlib/qlib_data/cn_data --trading_date 2021-06-11 --end_date 2022-01-15
Expected Behavior
Update the data
Screenshot
Environment
Note: User could run cd scripts && python collect_info.py all
under project directory to get system information
and paste them here directly.
- Qlib version: 0.8.1
- Python version: 3.8.12
- OS (
Windows
,Linux
,MacOS
): ubuntu18.04 - Commit number (optional, please provide it if you are using the dev version):
Additional Notes
Any update of this issue? I saw that the yahoo website no longer serve chinese clients, if there is any solution, I will create a PR to solve this problem.
I've reproduced the same error here this morning. It's more like a server issue. Anything happened to Yahoo?
@PalanQu @hensejiang We repeated the same error as yours and are looking for a solution.
Let me know if you have any ideas.
You can create a PR if you are interested. : )
sorry for the noninformative reply
It is not the fault of Yahoo. @Wangwuyi123 will try to fix the bug If you are interested, your PR is highly welcome
Yahoo no longer serves Chinese customers, if you want to get yahoo data, using VPN
is a good way, I am using windows system, set the proxy ip
and proxy port number
in command prompt:
set HTTP_PROXY=http://<your_proxy_server_address>:<your_proxy_port>
set HTTPS_PROXY=http://<your_proxy_server_address>:<your_proxy_port>
Now the download command will work, here is a screenshot of my execution, you can see that the download process is in a normal state.
As far as I know, this setting is temporary and will expire once the computer is shut down or rebooted. If you want to use the proxy in the command prompt for a long time to be effective, you can do so by adding the
proxy ip
and proxy port number
to the environment variables.