handle yahoo US collecotr api limit issue (Fix #1953)
Fix eastmoney API pagination for US stock data collection
Description
Fixed pagination issue in _get_eastmoney() function that was causing "request
error" when collecting US stock symbols. Changed from requesting 10,000
symbols per page to proper pagination with 100 symbols per page, iterating
through all pages until completion.
Changes made:
- Changed
pzparameter from10000to100(page size) - Added proper pagination loop with page increment
- Added exit conditions for API failures, empty responses, or no more data
- Added 0.01s delay between requests for rate limiting
Motivation and Context
Related Issue: eastmoney API now limits page size to 100 symbols maximum, but the original code was trying to fetch 10,000 symbols in a single request.
Problem: When running python collector.py download_data --source_dir ~/.qlib/stock_data/source/us_data --start 2020-01-01 --end 2020-12-31 --delay 1 --interval 1d --region US, it failed with "request error" because
len(_symbols) < 8000.
Root Cause: The eastmoney API
http://4.push2.eastmoney.com/api/qt/clist/get was returning only 100 symbols
instead of the requested 10,000, triggering the validation error.
How Has This Been Tested?
- [x] If you are adding a new feature, test on your own test scripts.
API Endpoint Testing:
- Verified API returns 12,095 total US stocks
- Tested pagination boundaries:
- Page 1-120: 100 symbols each
- Page 121: 95 symbols (final page)
- Page 122+: 0 symbols (empty, properly triggers exit)
- Confirmed total collection: 120×100 + 95 = 12,095 symbols
- Verified no duplicate symbols in collection
Test Commands Used:
# Test API response structure
curl -L "http://4.push2.eastmoney.com/api/qt/clist/get?pn=1&pz=100&fs=m:105,m:1
06,m:107&fields=f12" | jq '.data.total'
# Test pagination boundaries
for page in 120 121 122; do
curl -s -L "http://4.push2.eastmoney.com/api/qt/clist/get?pn=$page&pz=100&fs=
m:105,m:106,m:107&fields=f12" | jq '.data.diff | length'
done
Screenshots of Test Results (if appropriate):
1. Pipeline test: N/A (focused fix, existing pipeline tests should pass)
2. Your own tests:
- API Total Response: {"data":{"total":12095}} ✓
- Page 120: 100 symbols ✓
- Page 121: 95 symbols ✓
- Page 122: 0 symbols (null diff) ✓
- Successfully tested retrieval of all 12,095 US stock symbols
Types of changes
- Fix bugs
- Add new feature
- Update documentation
@microsoft-github-policy-service agree