qlib icon indicating copy to clipboard operation
qlib copied to clipboard

Update yahoo data to binary error

Open PalanQu opened this issue 3 years ago • 5 comments

🐛 Bug Description

I want to automatic update of daily frequency data, I write a crontab but it doesn't work, so I manually test with the following command

python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir /data/qlib/qlib_data/cn_data --trading_date 2021-06-11 --end_date 2022-01-15

I got

2022-01-19 16:08:14.819 | INFO     | collector:get_instrument_list:196 - get HS stock symbols......
2022-01-19 16:08:27.547 | INFO     | collector:get_instrument_list:198 - get 4323 symbols.
2022-01-19 16:08:27.558 | INFO     | data_collector.base:collector_data:204 - start collector data......
2022-01-19 16:08:27.558 | INFO     | data_collector.base:collector_data:209 - getting data: 1
  0%|                                                                                                                                                                              | 0/4323 [00:00<?, ?it/s]2022-01-19 16:08:30.050 | WARNING  | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:08:30.051 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:09.459 | WARNING  | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:09.459 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:11.971 | WARNING  | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:11.972 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:16.339 | WARNING  | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:16.340 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:18.811 | WARNING  | collector:get_data_from_remote:141 - 000001.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000001.sz'
2022-01-19 16:09:18.812 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000001.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:18.812 | WARNING  | data_collector.base:save_instrument:164 - 000001.sz is empty
  0%|                                                                                                                                                                   | 1/4323 [00:51<61:31:57, 51.25s/it]2022-01-19 16:09:21.320 | WARNING  | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:21.321 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:23.824 | WARNING  | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:23.825 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:27.271 | WARNING  | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:27.272 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:29.752 | WARNING  | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:29.753 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:32.478 | WARNING  | collector:get_data_from_remote:141 - 000002.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000002.sz'
2022-01-19 16:09:32.479 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000002.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:32.480 | WARNING  | data_collector.base:save_instrument:164 - 000002.sz is empty
  0%|                                                                                                                                                                   | 2/4323 [01:04<34:58:51, 29.14s/it]2022-01-19 16:09:34.966 | WARNING  | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:34.967 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:37.443 | WARNING  | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:37.444 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:39.923 | WARNING  | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:39.924 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:42.416 | WARNING  | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:42.417 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:44.887 | WARNING  | collector:get_data_from_remote:141 - 000004.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000004.sz'
2022-01-19 16:09:44.888 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000004.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:44.888 | WARNING  | data_collector.base:save_instrument:164 - 000004.sz is empty
  0%|                                                                                                                                                                   | 3/4323 [01:17<25:48:09, 21.50s/it]2022-01-19 16:09:47.383 | WARNING  | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:47.383 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:49.907 | WARNING  | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:49.908 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:52.393 | WARNING  | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:52.394 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:54.862 | WARNING  | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:54.863 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:57.343 | WARNING  | collector:get_data_from_remote:141 - 000005.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000005.sz'
2022-01-19 16:09:57.344 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000005.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:09:57.345 | WARNING  | data_collector.base:save_instrument:164 - 000005.sz is empty
  0%|▏                                                                                                                                                                  | 4/4323 [01:29<21:30:44, 17.93s/it]2022-01-19 16:09:59.847 | WARNING  | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:09:59.848 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 1 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:02.340 | WARNING  | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:02.341 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 2 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:04.816 | WARNING  | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:04.817 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 3 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:07.303 | WARNING  | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:07.304 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 4 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:10.044 | WARNING  | collector:get_data_from_remote:141 - 000006.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:'000006.sz'
2022-01-19 16:10:10.045 | WARNING  | data_collector.utils:wrapper:462 - _get_simple: 5 :get data error: 000006.sz--2021-06-11 00:00:00+08:00--2022-01-15 00:00:00+08:00
2022-01-19 16:10:10.045 | WARNING  | data_collector.base:save_instrument:164 - 000006.sz is empty
  0%|▏                                                                                                                                                                  | 5/4323 [01:42<19:14:42, 16.04s/it]2022-01-19 16:10:53.199 | WARNING  | collector:get_data_from_remote:141 - 000007.sz-1d-2021-06-11 00:00:00+08:00-2022-01-15 00:00:00+08:00:HTTPSConnectionPool(host='query2.finance.yahoo.com', port=443): Max retries exceeded with url: /v8/finance/chart/000007.sz?period1=1623340800&period2=1642176000&interval=1d&events=div%2Csplit&formatted=false&lang=en-US&region=US&corsDomain=finance.yahoo.com (Caused by ReadTimeoutError("HTTPSConnectionPool(host='query2.finance.yahoo.com', port=443): Read timed out. (read timeout=5)"))

To Reproduce

Steps to reproduce the behavior:

1.python scripts/data_collector/yahoo/collector.py update_data_to_bin --qlib_data_1d_dir /data/qlib/qlib_data/cn_data --trading_date 2021-06-11 --end_date 2022-01-15

Expected Behavior

Update the data

Screenshot

Selection_163

Environment

Note: User could run cd scripts && python collect_info.py all under project directory to get system information and paste them here directly.

  • Qlib version: 0.8.1
  • Python version: 3.8.12
  • OS (Windows, Linux, MacOS): ubuntu18.04
  • Commit number (optional, please provide it if you are using the dev version):

Additional Notes

PalanQu avatar Jan 19 '22 08:01 PalanQu

Any update of this issue? I saw that the yahoo website no longer serve chinese clients, if there is any solution, I will create a PR to solve this problem.

PalanQu avatar Jan 24 '22 08:01 PalanQu

I've reproduced the same error here this morning. It's more like a server issue. Anything happened to Yahoo?

hensejiang avatar Jan 26 '22 04:01 hensejiang

@PalanQu @hensejiang We repeated the same error as yours and are looking for a solution.

Let me know if you have any ideas.

You can create a PR if you are interested. : )

Wangwuyi123 avatar Jan 28 '22 07:01 Wangwuyi123

sorry for the noninformative reply

It is not the fault of Yahoo. @Wangwuyi123 will try to fix the bug If you are interested, your PR is highly welcome

you-n-g avatar Jan 30 '22 01:01 you-n-g

Yahoo no longer serves Chinese customers, if you want to get yahoo data, using VPN is a good way, I am using windows system, set the proxy ip and proxy port number in command prompt:

set HTTP_PROXY=http://<your_proxy_server_address>:<your_proxy_port>
set HTTPS_PROXY=http://<your_proxy_server_address>:<your_proxy_port>

Now the download command will work, here is a screenshot of my execution, you can see that the download process is in a normal state. image As far as I know, this setting is temporary and will expire once the computer is shut down or rebooted. If you want to use the proxy in the command prompt for a long time to be effective, you can do so by adding the proxy ip and proxy port number to the environment variables.

SunsetWolf avatar Oct 26 '23 09:10 SunsetWolf